"Only I can change my life. No one can do it for me."
Building a Data Warehouse on Microsoft Platform
Solution Architecture definition
Picture bellow represents the typical, well known data warehouse architecture.
On the left side you have operational data sources. ETL process feeding data warehouse from operational sources. Data in the data warehouse transformed and well formatted represents reference information repository (“single version of truth of enterprise data”) are available for use in a business models or some third party applications. On the right side you have a front and layer responsible for data warehouse interaction with business professionals within the organization.
Such solution architecture can be implemented with a following application types:
- Data Integration – represents data extraction from operational sources, transformation in a format suitable for later use and loading data into the data warehouse. Data cleansing is also element of this application type. You can use Microsoft SQL Server Integration Services (SSIS) to build this application type..
- OLAP and Data Mining – represents multidimensional applications and data mining models. This application layer leveraging on data warehouse and allows high level of data analysis with features like multidimensional analysis, drill down, drill up, drill through, etc. Data mining allows usage of mathematical and statistical algorithms in order to find patterns in large volume of data. Microsoft SQL Server Analysis Services (SSAS) is a perfect tool, fully competent for development of such application types.
- Front end - represents user interface, point of end users interaction with the data warehouse. High-quality user interface should be fully functional, easy for use with a great data visualization features. Combination of tools Microsoft Office (Excel), Microsoft SQL Server Reporting Services (SSRS) and Microsoft Share point Portal are quite sufficient to satisfy for high-quality user interface.
Let’s find bellow particular aspects of development for each application type.
Data Integration
Important Aspects of Data Integration
Since that Data Integration phase, also known as ETL (Extract Transform Load), consume more about 70% of effort and time in the data warehouse projects, therefore is reasonably important to define factors required for successful completion of the process.
There are top 5 tips for successful Data Integration:
- Chose the right tool – large volume of data and complex transformations requires powerful tool available to dealing with these challenges
- Use a lot metadata – since that essence of data integration is composed by mapping of operational systems to destination as well as data transformation, store all of that in metadata instead of applications. Over the time you will get benefits through maintenance lower costs.
- Engage experienced developers – they are expensive but several times more efficient then inexperienced
- Chose the right destination data model - operational sources you cannot choose, but good destination models allows you less complex transformations and better information potential
- Establish data governance – during ETL process, it is extremely important and quite efficient to establish data governance within the organization if no any. Good Data Governance policy and procedures is essential environment for the ETL process during the operational work.
In this article I would focus on technical aspects of data integration process.
Right Tool: SQL Server Integration Services (SSIS)
Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. You use Integration Services to solve complex business problems by copying or downloading files, sending e-mail messages in response to events, updating data warehouses, cleaning and mining data, and managing SQL Server objects and data. The packages can work alone or in concert with other packages to address complex business needs. Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.
Integration Services includes a rich set of built-in tasks and transformations; tools for constructing packages; and the Integration Services service for running and managing packages. You can use the graphical Integration Services tools to create solutions without writing a single line of code; or you can program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.
Build Your Own Components
Integration Services (SSIS) is an open environment, which means that you can build your own SSIS components. That can be very important in case that you consider development of SSI components that would improve efficiency and effectiveness of your particular data warehouse implementation. For instance, you can build components to manage data warehouse surrogate keys or resolve data warehouse classifications (leveraging on Meta model). Picture bellow show the same package which is developed by using Built In and Custom developed SSIS components.
OLAP and Data Mining
OLAP and Multidimensional Analysis
Microsoft SQL Server Analysis Services (SSAS)—Multidimensional Data allows you to design, create, and manage multidimensional structures that contain detail and aggregated data from multiple data sources, such as relational databases, in a single unified logical model supported by built-in calculations. Analysis Services—Multidimensional Data provides fast, intuitive, top-down analysis of large quantities of data built on this unified data model, which can be delivered to users in multiple languages and currencies. Analysis Services—Multidimensional Data works with data warehouses, data marts, production databases and operational data stores, supporting analysis of both historical and real time data.
Variety of development features
Cubes are foundation of multidimensional analysis. Building a Cube with SSAS is easy even if a Cube has complex structure with huge number of measures, dimensions an hierarchies.. There are plenty of features you could use to build comprehensive, flexible and high performance cubes.
Let’s consider some of them.
Multiple measure groups can be a part of single physical cube gives you possibility to create very complex cubes that could cover one or even more business processes. Consistence of relation between measure groups and dimension are provided with dimension usage feature. Calculations (Calculated Measures) are based on powerful MDX (Multidimensional expressions) which allows you to manipulate with multidimensional objects. Moreover using MDX expressions you can define KPI (Key Performance Indicators) as a objects that can be easily used in front end layer. Possibility to creating perspectives is another great feature of SSAS. Since that Cubes can be a quite complex, Perspectives feature allows you to create a number of logical views on top of your physical Cube. Translation feature provides Multilanguage support for your Cube on the easy and transparent way. If we add, on top of these nice features, possibility to create different king of hierarchies when you build shared dimensions, there is no doubt that Microsoft have a powerful weapon for response to his competitors on BI tool market. SSAS applications are working on top of star/snowflake schemas (with multiple fact tables) which are derived from data warehouse layer.
Data Mining Features
Data Mining features are embedded in SQL Server Analysis Services (SSAS). There are 9 embedded data mining algorithms (Microsoft Association Algorithm, Microsoft Clustering Algorithm, Microsoft Decision Trees Algorithm, Microsoft Decision Trees Algorithm Technical Reference, Microsoft Linear Regression Algorithm, Microsoft Logistic Regression Algorithm, Microsoft Neural Network Algorithm, Microsoft Sequence Clustering Algorithm, Microsoft Time Series Algorithm) divided into 5 categories :
- Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. An example of a classification algorithm is the Microsoft Decision Trees Algorithm.
- Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. An example of a regression algorithm is the Microsoft Time Series Algorithm.
- Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. An example of a segmentation algorithm is the Microsoft Clustering Algorithm.
- Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. An example of an association algorithm is the Microsoft Association Algorithm.
- Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. An example of a sequence analysis algorithm is the Microsoft Sequence Clustering Algorithm.
Choosing the best algorithm to use for a specific business task can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.
Data Mining Application consists of Mining Structures. Each Mining structure consist Mining Models. Mining models can predict values, produce summaries of data, and find hidden correlations. Powerful Data Mining extensions (DMX) allow you to manipulate with data mining objects.
Front End
Front end is quite important part of the entire solution. The reason is because front end is everything that end users can see – often they cannot recognize the complexity of the whole system, which is generally good. Moreover it is very common satiation that users perceive quality of the whole solution through quality of front end part (usability, features, attractiveness, data visualization, etc). In a last decade there is a flood of BI front end tools with so many fancy features, great data visualizations, etc. But, at the end, the first question that business professionals ask their trainers is: “Can I do export of this data in Excel?”.
Microsoft Excel – The Backbone of Microsoft Platform Front End
Microsoft put a lot of investments in Excel in order to make him as a powerful BI client. Using Microsoft Excel you can create predefined reports as well as free forma analysis. Free form analysis in facts represents Microsoft Excel pivot table aligned with SSAS Cube data with great filtering, drill down and drill through features as well as variety of conditional formatting and other data visualization features. Business users are finally able to create they own reports without asking for help IT professionals. And best of all, they don’t need additional training – they using Microsoft excel every day.
So, if you require powerful BI client and wish to acquire your business users – use Microsoft Excel.
The Excel pivot table has long been one of the most popular BI tools for data analysts, and the most frequently used client tool for SQL Server OLAP cubes. With Excel 2007, its integration with Analysis Services has been revamped to comprehensively support SQL Server-specific features. We will explore how this integration helps to improve the user experience. Many other enhanced Excel 2007 features, such as charting, filtering and formatting, bring significant benefits to BI usage scenarios as well. In addition, the new Excel Services feature of SharePoint 2007 allows spreadsheets to be published to a SharePoint server, and be viewed by users via web browser.
Reporting Services – Web based
For less demanding web based reporting you can use Microsoft SQL Server Reporting Services (SSRS) which provides a full range of ready-to-use tools and services to help you create, deploy, and manage reports for your organization, as well as programming features that enable you to extend and customize your reporting functionality.
SQL Server 2008 Reporting Services (SSRS) is a server-based reporting platform that provides comprehensive reporting functionality for a variety of data sources. Reporting Services includes a complete set of tools for you to create, manage, and deliver reports, and APIs that enable developers to integrate or extend data and report processing in custom applications. Reporting Services tools work within the Microsoft Visual Studio environment and are fully integrated with SQL Server tools and components.
With Reporting Services, you can create interactive, tabular, graphical, or free-form reports from relational, multidimensional, or XML-based data sources. You can publish reports, schedule report processing, or access reports on-demand. Reporting Services also enables you to create ad hoc reports based on predefined models, and to interactively explore data within the model. You can select from a variety of viewing formats, export reports to other applications, and subscribe to published reports. The reports that you create can be viewed over a Web-based connection or as part of a Microsoft Windows application or SharePoint site. Reporting Services provides the key to your business dat
Integration and Performance Dashboards: Share Point Portal
So far, we have almost whole solution completed. Just need to design and develop performance dashboards for different kind of users and integrate all front end parts into one single point of interaction, personalized according to particular user requirements. And that job is reserved to Share Point Portal.
Microsoft Office SharePoint Server is an integrated suite of server capabilities that can help improve organizational effectiveness by providing comprehensive content management and enterprise search, accelerating shared business processes, and facilitating information-sharing across boundaries for better business insight. Additionally, this collaboration and content management server provides IT professionals and developers with the platform and tools they need for server administration, application extensibility, and interoperability.
Conclusion
In the last few years Microsoft puts a lot of investment in order to improve BI capabilities in his products. Results are definitely here. Microsoft BI platform (also known as “Microsoft BI stack” consist of backbone products such as SQL Server (database engine, SIS, SSAS, SSRS) Microsoft Office(Excel) and Share Point Portal are able to respond to highly complex and comprehensive requests. The great quality of this platform is possibility to develop quite complex and comprehensive applications in relatively short timeframe. Integration of several products in one comprehensive state of the art solution also can result with les TCO. Finally, business professionals who are using solution are most important judge. They able to get a full informational potential from their organization in their native every day tool – Microsoft Excel. That is all what they need.
References:
SQL Server Books Online
3 votes | 2 comments






Comments
You are welcome!
Hi Bojan,
Excelent articule! short, concret important.
Only give you thanks for your time in transmited your knowledge.
Success!!
Ramiro
Post new comment