from 12:30 pm to 13:15 pm
The ultimate business success of Big Data in business will depend on our ability to successfully bring about the realignment and placement of Big Data into a more generalized architectural framework, one that coalesces strategic, technical and management elements of data warehousing (DW 3.0), business intelligence, textual analysis and statistical analysis into a coherent, synergistic and usable whole.
In simple terms the classes of data in focus can be defined as follows:
Enterprise Operational Data – This is data that is used in applications that support the day to day running of an organisational operations.
Enterprise Process Data – This is measurement and management data collected to show how the operational systems are performing.
Enterprise Information Data – This is primarily data which is collected from internal and external data sources, the most significant source being typically Enterprise Operational Data.
These classes of data are integrated, orchestrated and channeled in a data architecture that includes the three key data integration elements of data sourcing, structuring and data-based statistical analysis:
Data Sources – This element covers all the current sources, varieties and volumes of data available which may be used to support processes of 'challenge identification', 'option definition', decision making, including statistical analysis and scenario generation.
Core Data Warehousing – This is a suggested evolution path of the DW 2.0 model. It extends the Inmon paradigm to not only include unstructured and complex data but also the information and outcomes derived from statistical analysis performed outside of the Core Data Warehousing landscape.
Core Statistics – This element covers the core body of statistical competence, especially but not only with regards to evolving data volumes, data velocity and speed, data quality and data variety.
In terms of data and its integration and correlation, this paper will focus on Complex data; Event data; and, Infrastructure data:
Complex Data – This is unstructured or highly complexly structured data contained in documents and other complex data artefacts, such as multimedia and ECM documents.
Event Data – This is an aspect of Enterprise Process Data, and typically at a fine-grained level of abstraction. Here are the business process logs, the internet web activity logs and other similar sources of event data. The volumes generated by these sources will tend to be higher than other volumes of data, and are those that are currently associated with the Big Data term, covering as it does that masses of information generated by tracking even the most minor piece of 'behavioural data' from, for example, someone visiting a web site.
Infrastructure Data – This aspect includes data which could well be described as signal data. Continuous and high velocity streams of potentially volatile data that might be processed through complex event correlation and near-real-time analysis components.
The talk will conclude with a repass of business, technical and architectural imperatives:
Without a business imperative there is no business reason to do it: What does this mean? Well, it means that for every significant action or initiative, even a highly speculative initiative, there must be a tangible and credible business imperative to support that initiative. The difference is as clear as that found between the Sage of Omaha and Santa Claus.
All architectural decisions are based on a full and deep understanding of what needs to be achieved and of all of the available options: For example, rejecting the use of a high performance database management product must be made for sound reasons, even if that sound reason is cost. It should not be based on technical opinions such as "I don't like the vendor, much". If a flavour of Hadoop makes absolute sense then use it, if Exasol or Oracle or Teradata make sense, then use them. You have to be technology agnostic, but not a dogmatic technology fundamentalist.
That statistics and non-traditional data sources are fully integrated into the future Data Warehousing landscape architectures: Building even more corporate silos, whether through action or omission, will lead to greater inefficiencies, greater misunderstanding and greater risk.
The architecture must be coherent, coherent, usable and cost-effective: If not, what's the point, right?
That no technology, technique or method is discounted: We need to be able to cost-effectively incorporate any relevant, existing and emerging technology into the architectural landscape.
Reduce early and reduce often: Massive volumes of data, especially at high speed, are problematic. Reducing those volumes, even if we can't theoretically reduce the speed is absolutely essential. I will elaborate on this point and the following separately.
That only the data that is required is sourced. That only the data that is required is forwarded: Again, this harks back on the need for clear business imperatives tied to the good sense of only shipping data that needs to be shipped.