Saturday, May 17, 2008

Setting up a database (MR, unit 3)

2 types of database:
  1. Flat file system - single worksheet, all data in one file
  2. Relational database - linked tables, more efficient as user can interrogate multiple tables and generate an integrated report
Storage structure: fields (labels for types of data), records (collections of fields relevant to one entry), tables (collections of records describing similar data), databases (collections of tables relating to particular set of information)

Data cleansing
ensuring information is correct, properly formatted and not duplicated
Cleansing new data (FVV)

Form elements: introduce pre-defined responses
  • radio buttons
  • check boxes
  • list-boxes
Validation: apply pre-programmed tests to ensure types data input is reasonable
  • format checks - numeric, alphabetical or combi / min & max characters
  • range checks - test data within appropriate range (e.g. Feb 31 doesn't exist)
  • existence checks - compares input data with other data in system to stop duplication or deliberate
  • completeness checks - ensure all required fields completed
Verification: compares input data with source document (postcodes & automatic street names can trigger checking process)

Cleansing imported data
Format & consistency: sensitivity to format differences, essential for high-speed analysis. XML can alleviate.
Deduplication: duplicating will annoy customers. Happens through acquisition from different databases, people inconsistent. Easier in B2C by comparing common fields. Harder in B2B where different businesses can have same location or the same business, multiple locations

OVERKILL - delete all but most recent
UNDERKILL - rely on customers to tell them of changes


Database maintenance
Should be regular and systemic. Add new fields, updates, remove names with no response, check returned mail. delete any request to insubscribe

No comments: