General Datawarehousing Interview Preparation Guide
Download PDF

General Datawarehousing guideline for job interview preparation. Explore list of General Datawarehousing frequently asked questions(FAQs) asked in number of General Datawarehousing interviews. Post your comments as your suggestions, questions and answers on any General Datawarehousing Interview Question or answer. Ask General Datawarehousing Question, your question will be answered by our fellow friends.

40 General Datawarehousing Questions and Answers:

1 :: What is Normalization, First Normal Form, Second Normal Form, Third Normal Form?

1.Normalization is process for assigning attributes to entities?Reducesdata redundancies?Helps eliminate data anomalies?Produces controlledredundancies to link tables

2.Normalization is the analysis offunctional dependency between attributes / data items of userviews􀁺It reduces a complex user view to a set of small andstable subgroups of fields / relations

1NF:Repeating groups must beeliminated, Dependencies can be identified, All key attributesdefined,No repeating groups in table

2NF: The Table is already in1NF,Includes no partial dependencies?No attribute dependent on a portionof primary key, Still possible to exhibit transitivedependency,Attributes may be functionally dependent on non-keyattributes

3NF: The Table is already in 2NF, Contains no transitivedependencies

2 :: Explain piconet?

The original Piconet was a USB-style expansion port on RM Nimbus computers.

These days, a piconet is an ad-hoc computer network linking a user group of devices using Bluetooth technology protocols to allow one master device to interconnect with up to seven active slave devices (because a three-bit MAC address is used). Up to 255 further slave devices can be inactive, or parked, which the master device can bring into active status at any time.

A piconet typically has a range of about 10 m and a transfer rate between about 400 and 700 kbit/s, depending on whether synchronous or asynchronous connection is used.

All Parked Slaves have 8 bit parked member address (PMA) and all the active slaves have 3 bit active member address (AMA). The AMA is used by the master to send packets to a specific slave and to identify that the slave has sent a response packet.

3 :: Explain What are the Different methods of loading Dimension tables?

Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.

Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.

4 :: What is ODS?

1. ODS means Operational Data Store.

Submitted by Francis C. ( xxchen74 @ hotmail . com )

2. A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.

5 :: Explain me What are Data Marts?

Data Marts are designed to help manager make strategic decisions about their business.

Data Marts are subset of the corporate-wide data that is of value to a specific group of users.

There are two types of Data Marts:

1.Independent data marts ? sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.

2.Dependent data mart ? sources directly form enterprise data warehouses.

6 :: What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.

7 :: Explain me what is VLDB?

VLDB stands for Very Large DataBase.

It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.

8 :: What is SCD1 , SCD2 , SCD3?

SCD Stands for Slowly changing dimensions.

SCD1: only maintained updated values.

Ex: a customer address modified we update existing record with new address.

SCD2: maintaining historical information and current information by using

A) Effective Date

B) Versions

C) Flags

or combination of these

SCD3: by adding new columns to target table we maintain historical information and current information.

9 :: Explain What is What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?

Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.

EX: Average daily balance

A fact table without numeric fact columns is called factless fact table.

Ex: Promotion Facts

While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.

10 :: Explain ssl?

The Secure Sockets Layer (SSL) is a commonly-used protocol for managing the security of a message transmission on the Internet. SSL has recently been succeeded by Transport Layer Security (TLS), which is based on SSL. SSL uses a program layer located between the Internet's Hypertext Transfer Protocol (HTTP) and Transport Control Protocol (TCP) layers. SSL is included as part of both the Microsoft and Netscape browsers and most Web server products. Developed by Netscape, SSL also gained the support of Microsoft and other Internet client/server developers as well and became the de facto standard until evolving into Transport Layer Security. The "sockets" part of the term refers to the sockets method of passing data back and forth between a client and a server program in a network or between program layers in the same computer. SSL uses the public-and-private key encryption system from RSA, which also includes the use of a digital certificate.
TLS and SSL are an integral part of most Web browsers (clients) and Web servers. If a Web site is on a server that supports SSL, SSL can be enabled and specific Web pages can be identified as requiring SSL access. Any Web server can be enabled by using Netscape's SSLRef program library which can be downloaded for noncommercial use or licensed for commercial use.

TLS and SSL are not interoperable. However, a message sent with TLS can be handled by a client that handles SSL but not TLS.

11 :: Explain What are the various Reporting tools in the Market?

1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity

12 :: Explain the Difference between OLTP and OLAP?

Main Differences between OLTP and OLAP are:-

1. User and System Orientation

OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.

OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).

2. Data Contents

OLTP: manages current data, very detail-oriented.

OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.

3. Database Design

OLTP: adopts an entity relationship(ER) model and an application-oriented database design.

OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.

4. View

OLTP: focuses on the current data within an enterprise or department.

OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores

13 :: What is Snow Flake Schema?

Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.

14 :: What is a lookup table?

A lookUp table is the one which is used when updating a warehouse. When the lookup is placed on the target table (fact table / warehouse) based upon the primary key of the target, it just updates the table by allowing only new records or updated records based on the lookup condition.

15 :: What is the Differences between star and snowflake schemas?

Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.

16 :: Explain What is real time data-warehousing?

Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.

17 :: What are vaious ETL tools in the Market?

Various ETL tools used in market are:

1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction

18 :: What is pre-emptive and non-pre-emptive?

Premptive means taken as a measure against something possible, anticipated, or feared; preventive; deterrent: a preemptive tactic against a ruthless business rival.

Non Pre-emptive is the exact opposite to Pre-emptive.No such preventive measures has been taken.

19 :: Explain What type of Indexing mechanism do we need to use for a typical datawarehouse?

On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.

To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.

20 :: What are the advantages of RAID 1, 1/0, and 5. What type of RAID setup would you put your TX logs?

Transaction logs write sequentially and don't need to be read at all. The ideal is to have each on RAID 1/0 because it has much better write performance than RAID 5.

RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tad less reliability and performance is a little worse generally speaking.

RAID 5 is best for data generally because of cost and the fact it provides great read capability.