Guest post by Dr. Gilbert Saggia, Country Manager, Oracle Kenya.
Large companies realize that the big data bell tolls for them, not just for startups. 2015 will be a big year for big data in the enterprise. Here are Oracle’s top 7 predictions:
Corporate boardrooms will talk about data capital, not big data. Data is now a kind of capital. It’s as necessary for creating new products, services and ways of working as financial capital. For CEOs, this means securing access to, and increasing use of, data capital by digitizing and datafying key activities with customers, suppliers and partners before rivals do. For CIOs, this means providing data liquidity – the ability to get data the firm wants into the shape it needs with minimal time, cost and risk.
Big data management will grow – up. Hadoop and NoSQL will graduate from experimental pilots to standard components of enterprise data management, taking their place alongside relational databases. Over the course of the year, early majority firms will settle on the best roles for each of these foundational components. The demand for data liquidity will compel architects to find new ways to make the full big data environment – Hadoop, NoSQL, and relational technologies – act as a mature enterprise-grade system.
Companies will demand a SQL for all seasons. SQL is not just a technology standard. It’s a language based on 100 years of hard thinking about how to think straight about data. Applications, analysts, and algorithms rely on it daily to run everything from fraud analyses to freight forwarding. Companies will demand that SQL works with all big data, not just data in a Hadoop, NoSQL (Oh, the irony!), or relational silo. They’ll also demand that this big data SQL works just like full-fledged modern SQL that their applications and developers already use. This will put pressure on nascent Hadoop-only SQL to mature overnight.
Just-in-time transformation will transform ETL. New in-memory streaming technologies change the rate at which we can act on data, causing a re-examination of extract, transform, and load (ETL) activities. Data scientists will increasingly opt for real-time data replication tools instead of batch-oriented ones to get data into Hadoop, which has been the norm. They’ll also take advantage of distributed in-memory processing to make data transformation fast enough to support interactive exploration, creating new data combinations on the fly.
Self-service discovery and visualization tools will come to big data. New data discovery and visualization tools will help people with expertise in the business but not in technology use big data in daily decisions. Much of this data will come from outside the firm and, therefore, beyond the careful curation of enterprise data policies. To simplify this complexity, these new technologies will combine consumer-grade user experience with sophisticated algorithmic classification, analysis and enrichment under the hood. The result for business users will be easy exploration on big data, like knowing where the oil is before digging the well.
Security and governance will increase big data innovation. Many large firms have found their big data pilots shut down by compliance officers concerned about legal or regulatory violations. This is particularly an issue when creating new data combinations that include customer data. In a twist, firms will find big data experimentation easier to pen up when the data involved is more locked-down. This means extending modern security practices like data masking and redaction to the full big data environment, in addition to the must-haves of access, authorization and auditing.
Production workloads blend cloud and on-premise capabilities. Once companies see enterprise security and governance extended to high-performance cloud environments, they’ll start to shift workloads around as needed. For example, an auto manufacturer that wants to combine dealer data born in the cloud with vehicle manufacturing data in an on-premise warehouse may ship the warehouse data to the cloud for transformation and analysis, only to send the results back to the warehouse for real-time querying.