Saturday, May 17, 2008

Data warehousing & data mining (MR, unit 3)

Data warehousing contains a wide variety of data (centrally stored) that presents a coherent picture of business conditions at a single point in time
Data used for: queries/reports, online analytical processing (OLAP for multidimensional analysis) and data mining.

Data Marts are smaller than warehouses. Hold selection of data for specific purpose. Cheaper and quicker to build although one big warehouse better than lots of marts.

Advantages:
  • easy access
  • wide range data results in wider organisational perspective
  • proven good at
    • quantifying effect of marketing initiatives
    • improving knowledge about customers
    • identifying & understanding most profitable revenue streams

Data mining is a class of database applications that look for hidden patterns in a group of data. Used for decision-making & predicting future behaviour.
Examples of patterns (mainly predictive or description)
Clusters - patterns between ranges of data items e.g. 20+, unmarried, £50k+ more likely to buy sports car
Association - one event correlates with another e.g. men buy nappies & beer on way home
Forecasting - identify trends that can be extrapolated into the future

4 key statistical techniques (CCND):
  1. Clustering - group with similar characteristics
  2. Classification techniques - assign people to predetermined classes based on their profile data
  3. Neural networks - non-linear predictive models, adjust weighting through 'training'
  4. Decision-trees - decision points governed by rules (follow paths to solution)
Datamining software from NeoVista could be used to refine inventory stock levels, predict those with health problems, optimise store layours

No comments: