Sunday, January 31, 2021

Complete Tutorial for Beginners on Business Object 3.0 till 4.2

  

I want to give the user a playful logical learning experience of Business Objects, and it will help every beginner to learn and get complete information.


The Story behind, the Business Object's comes into picture is initially was a French company, a market leader in Business Intelligence software, which enabled companies to gain information from the vast amount of raw data they keep. In 2008 BOBJ was acquired by SAP, and it's software was combined with SAP software, to provide reporting functionality on top of SAP solutions.

SAP Business Objects (BO, BOBJ, or BObjects) is an enterprise software company, specializing in business intelligence (BI). Business Objects was acquired in 2007 by German company SAP AG. The company claimed more than 46,000 customers in its final earnings release prior to being acquired by SAP.

Actually, BO was created keeping in mind to support the Data Analyst, who need to perform a predictive analysis based on the existing data and generates the report., So that problem arises, when the data becomes huge and difficult to manage. 

SAP Business Objects BI or SAP BO, formerly known as BOBJ, is an enterprise software solution that specializes in reporting and analytical business intelligence (BI). It comprises various reporting applications that help the users to find data, conduct analytics, and generate reports.

SAP Business Objects also offers other services such as consulting and education. It aims to simplify the process of reporting and analytics allowing SAP customers to deploy business intelligence projects and perform predictive analytics without any support from data analysts.

Furthermore, the SAP BO BI apps make use of the drag-and-drop functionality so that the users can discover and analyse data from various sources. SAP BO is a front-end-based BI platform which integrates the data from different back-end sources instead of storing it at the application level.


So Business Objects is basically a reporting tool, used for performing predictive analysis on the report based on the Enterprise data available with the Company and make future decisions based on it.


So before understanding Business objects you need to understand the dataware house concepts, databased and then Business Objects.


Lets begin with the Data warehouse concept. As every IT business runs on data, So inorder to manage and store the data, we need the data storage, which is stored based on the concept of Data warehousing.


  • Data warehouse
  • Business Intelligence
  • Data Mining
  • Data Mart
  • OLTP
  • OLAP
  • Normalization
  • Denormalization
  • Composite Key, Candidate Key, Unique Key, Surrogate key
  • Dimensions,  Facts , aggregated facts
  • Dimensional Modelling
  • Conformed dimensions
  • Forms of Dimensional modelling
  • Normal Forms
  • Data Points
  • Slowly changing Dimensions

 

1.       Data warehouse:

Formal Definition:

A data warehouse is a subject oriented, integrated, time variant and non-volatile collection of data in support of management decision making process.

It means:

Subject oriented: Stored data targets specific subject

E.g. it may store data like Sales, Number of customers etc. and not general data on everyday operations.

Integrated: Data may be distributed across heterogeneous sources which have to be integrated.

E.g. Sales data may be on RDBMS, customer data may be on flat files

Time variant: Stored Data may not be current but varies with time and data have an element of time.

E.g. Data of Sales in last 5 years

Non Volatile: It is separate from the enterprise operational database and may not be subject to frequent modifications. It generally has two operations performed on it: Loading of data and Access of data

 

 

 

My understanding:

Data warehouse integrates data from different or multiple data sources for reporting and analysis. It stores huge amount of data, more historical than current data. It is a subject oriented, integrated, time variant, non-volatile collection of data to support decision making process for a business.  It does not require data to be highly accurate. The goal is to execute statistical queries and provide results which can influence decision making process for an enterprise.

These systems are thus called Online Analytical processing (OLAP) Systems

 

2.       Business Intelligence:

Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.

BI technologies provide historical, current and predictive views of business operations. Common functions of business intelligence technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.

 

My Understanding:

Business intelligence constitutes a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise management or users to make better business decisions. BI applications include the activities of decision support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining.

E.g. you can follow the trend of growth for your business for the last ten years and based on this data, management can forecast the business growth and take future decision to help growing the business or an enterprise. 

3.       Data Mining:

Data mining is the process that attempts to discover patterns in large data sets. It is done for predictive analysis. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

My Understanding:

Data mining deals with analyzing large data set to extract some specific information and discover a meaningful pattern to help grow business or enterprise. It is done for predictive analysis. The analysis is based on historical data plus current data. There are tools used for data mining, most commonly used tools are:

- Google Chart API: Google's Web-based tool allows users to create charts and graphs that can be easily embedded into a website.

- SAS Analytics: SAS's software includes tools for data mining and statistical analysis as well as forecasting trends

E.g.  A shopping mall offers seasons end sale with some predefined combinations of products to increase the sales and ultimately to get a profitable growth from business. The predefined combinations are decided based on the analysis of large data set using data mining tools.

4.       Data Mart:

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data. This enables each department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions like customer, product etc.

My Understanding:

Data mart is a concept of organization of data into smaller units. A data mart is a subset of data warehouse specific to a particular business stream.

E.g. A shopping mall has outlets of different brands that are doing business under a common shelter. Each outlet is interested to view its own data for business decisions and not any other outlet’s data. To organize the data in an efficient way, data marts are created that contains data relevant to a specific brand outlet for their analysis.

5.       OLTP: Online transaction processing

Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

Transaction processing is a type of computer processing in which the computer responds immediately to user requests. Each request is considered to be a transaction. Automatic teller machines for banks are an example of transaction processing.

OLTP (online transaction processing) is a class of software programs capable of supporting transaction-oriented applications on the Internet. Typically, OLTPsystems are used for order entry, financial transactions, customer relationship management (CRM) and retail sales.

My Understanding:

OLTP is basically used for transaction oriented analysis. It is basically used for data entry based on each transaction and retrieval of the same in a transactional manner. OLTP is based on operational data. E.g. ATM’s are one of the common examples of OLTP.

6.       OLAP: Online Analytical Processing

On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations:

Consolidation (roll-up): Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends.

Drilldown: In contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can view the sales by individual products that make up a region’s sales.

 Slicing and dicing: It is a feature whereby users can take out (slicing) a specific set of data of the OLAP cube and view (dicing) the slices from different viewpoints.

Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational databases, hierarchical databases and relational databases.

My Understanding:

OLAP is commonly based on analyzing historical data. It helps management of an enterprise or business to develop insights in an effective, fast and interactive way. OLAP tools are used to analyze multidimensional data from multiple views or perspective.

7.       OLTP vs OLAP:

 

8.       Normalization:

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

My Understanding:

The main goal of normalization process is organizing data in an effective manner. Normalization involves redundancy of data to avoid duplicate data stored in different tables in a database and ensuring that only relevant data is stored in a table. 

9.       Denormalization:

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.  In some cases, denormalisation helps cover up the inefficiencies inherent in relational database software. A valid reason for denormalizing a relational design is to enhance performance.

A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.

My Understanding:

Denormalization is the concept of adding redundant data or by grouping data in order to optimize the readability of a database or performance aspect of reading data from a database 

10.   Composite Key:

 A Primary Key uniquely identifies each row in a table, it is not always a single-column key, it could be

    1. a single-column key
    2. or a composite key

A primary key can consist of one or more columns of a table. When two or more columns are used as a primary key, they are called a composite key. Each single column's data can be duplicated but the combination values of these columns cannot be duplicated.

For example, if you have a Student table and a Course table, and one student can select many courses and one course can be selected by many students, so this is a many-to-many relationship. So you need to create the third table to define the relationship, say it's called StudentCourse. It is important to note that you only need the StudentID and CourseID in this table as a composite key. You do not need an extra identity ID column in this table to uniquely identifies each row because only having an ID column to uniquely identifies each row is not sufficient. It cannot prevent the same student selecting the same course from being inserted into this table.

 

Candidate Key:

Candidate key is a combination of attributes that can be uniquely used to identify a database record without any extraneous data. Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key.

 

Unique Key:

The UNIQUE constraint uniquely identifies each record in a database table. The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set of columns. A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.

Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.

e.g. The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is created:

CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
UNIQUE (P_Id)
)

Surrogate key:

A surrogate key is a substitution for the natural primary key.

It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.

Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.

11.   Fact Table :

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables.

Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store.

A fact table is used in dimensional model in data warehouse design. A fact table is  found at the center of a star schema or snowflake schema surrounded by dimension tables.

A fact table consists of facts of a particular business process e.g., sales revenue by month by product. Facts are also known as measurements or metrics. A fact table record captures a measurement or a metric.

My Understanding:

Fact table constitutes of measurable, sum able, calculative metrics of a business process. Fact tables provide values that act as variables based on which dimensional variables are analyzed. Fact tables contains fact values and keys to access dimension tables.

Dimension table:

In data warehousing, a dimension table is one of the set of companion tables to a fact table.

The fact table contains business facts or measures and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables.

Contrary to fact tables, the dimension tables contain descriptive attributes (or fields) which are typically textual fields or discrete numbers that behave like text. These attributes are designed to serve two critical purposes: query constraining/filtering and query result set labeling.

Dimension table rows are uniquely identified by a single key field. It is recommended that the key field is a simple integer for the reason that key value is meaningless and is only used to be join fields between the fact and dimension tables.

My Understanding:

Dimension table consists of different entities that are non-measurable and non-aggregatable. Dimension table is one of the set of companion tables to a fact table. Dimension tables contain fields which are typically textual fields or discrete numbers that behave like text.

A dimension table is a table in a star schema of a data warehouse. A dimension table stores attributes, or dimensions, that describe the objects in a fact table.

Dimension tables are referenced by fact tables using keys. When creating a dimension table in a data warehouse, a system-generated key is used to uniquely identify a row in the dimension. This key is also known as a surrogate key. The surrogate key is used as the primary key in the dimension table. The surrogate key is placed in the fact table and a foreign key is defined between the two tables. When the data is joined, it does so just as any other join within the database.

12.   Database modeling:

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested.

Common models include:

-          Hierarchical model

-          Network model

-          Relational model

-          Entity-relationship

-          Object-relational model

-          Multivalue model

-          Object model

-          Document model

A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as select (project) and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.

My Understanding:

Database modeling deals with the structuring of data in a database. It is a theory describing how a database is structured and used. Some of the examples are Relational model, Entity-relationship, Object-relational model.

13.   Dimensional modeling:

Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. It is considered to be different from entity-relationship modeling (ER).

Dimensional Modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance

Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions.

 

 

My Understanding:

Dimensional modeling deals with the concept of facts and dimensions. It is a design proposal for databases intended to support end-user queries in a data warehouse. This modeling is used in data warehouse design.

For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the different business process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved using conformed dimensions.

14.   Conformed dimension:

A conformed dimension is a set of data attributes that have been physically implemented in multiple database tables using the same structure, attributes, domain values, definitions and concepts in each implementation. A conformed dimension cuts across many facts.

Conformed dimensions are either identical or strict mathematical subsets of the most granular, detailed dimension. Dimension tables are not conformed if the attributes are labeled differently or contain different values. Conformed dimensions come in several different flavors. At the most basic level, conformed dimensions mean exactly the same thing with every possible fact table to which they are joined. The date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

My Understanding:

Dimensions are conformed when they are either exactly the same (including keys) or one is a perfect subset of the other. Most important, the row headers produced in the answer sets from two different conformed dimensions must be able to match perfectly.

15.   Forms of Dimensional Model:

Star schema:

Star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables. The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.

A star schema classifies the attributes of an event into facts (measured numeric/time data), and descriptive dimension attributes (product id, customer name, sale date) that give the facts a context. A fact record is the nexus between the specific dimension values and the recorded facts. The facts are stored at a uniform level of detail (the grain) in the fact table. Dimension attributes are organized into affinity groups and stored in a minimal number of dimension tables.

Star schemas are designed to optimize user ease-of-use and retrieval performance by minimizing the number of tables to join to materialize a transaction.

A star schema is called such as it a constellation of stars, generally several bright stars (facts) surrounded by dimmer ones (dimensions).

The fact table holds the metric values recorded for a specific event. Because of the desire to hold atomic level data, there generally are a very large number of records (billions). Special care is taken to minimize the number and size of attributes in order to constrain the overall table size and maintain performance. Fact tables generally come in 3 flavors - transaction (facts about a specific event e.g. Sale), snapshot (facts recorded at a point in time e.g. Account details at month end), and accumulating snapshot tables (e.g. month-to-date sales for a product).

Dimension tables usually have few records compared to fact tables, but may have a very large number of attributes that describe the fact data.


Snowflake schema:

Snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.

The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are normalized into multiple related tables, whereas the star schema's dimensions are normalized with each dimension represented by a single table.

A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables ("forks in the road"). The "snowflaking" effect only affects the dimension tables and NOT the fact tables.

 

 

Fact Constellation Schema:

This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension. The split of fact table is done only when we want to focus on aggregation over few facts & dimensions. This is also called as multiple star schema.

16. The Normal Forms:

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:

  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:

  • Meet all the requirements of the third normal form.
  • Every determinant must be a candidate key.

 

Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

17.   Data Points:

A data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual. In addition, in statistical graphics, a "data point" may be an individual item with a statistical display; such points may relate to either a single member of a population or to a summary statistic calculated for a given subpopulation.

18.   Slowly changing dimension:

Dimension is the logical groupings of data such as geographical location, customer or product information. With Slowly Changing Dimensions (SCDs) data changes slowly, rather than changing on a time-based, regular schedule.

For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension?

You could calculate the sum or average of each salespersons sales, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson was transferred and used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, his/her totals will look much stronger than the other salespeople in their new region. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems.

Dealing with these issues involves SCD management methodologies referred to as Type 0 through 6. Type 6 SCDs are also sometimes called Hybrid SCDs. The most common types are I, II, and III.

Type 1:

This methodology overwrites old with new data, and therefore does not track historical data. Its common uses are for misspelled names. (Assuming you won't need to know how it was misspelled in the past.)

Example of a supplier table:

In the above example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code). However, to optimize performance on joins use integer than character.

If the supplier relocates the headquarters to Illinois. The updated table would overwrite this record:

The disadvantage of this is that there is no history in the data warehouse. An advantage of this type is it’s easy to maintain.

If you have calculated an aggregate table summarizing facts by state, it will need to be recalculated when the Supplier_State is changed

Type II:

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.

For example, if the supplier relocates to Illinois the version numbers will be incremented sequentially:

 

Type III:

This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data. The original table structure in Type I and Type II is the same but Type III adds additional columns. In the following example, an additional column has been added to record the supplier's original state - only the previous history is stored.

This record contains a column for the original state and current state—cannot track the changes if the supplier relocates a second time.

One variation of this is to create the field Previous_Supplier_State instead of Original_Supplier_State which would track only the most recent historical change.

Type 4:

The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

For the above example the original table name is Supplier and the history table is Supplier_History.

 

 

My Understanding:

SCD are the dimensions that changes over the time e.g. designation, address, Contact number. To handle SCD, three types of SCD are most widely used. Those are:

Type1:

Existing record is updated or overwritten and there is no track of the historical information.

Type2:

A new record is inserted or multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers are added to differentiate the record uniquely.

Type3:

It inserts a new column. This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data.

No comments:

Post a Comment

Interview Questions

 1. What is a chasm Trap and Fan Trap Trap is a join pathway between two entities, is either ambiguous or false and which may results into i...