Data Management (SBI220)

10 credits

Aim of this module

The aim of this rotation is to provide trainees with an overview of the key elements of the data management of public health data and its impact on patients and the public. This will include areas such as data governance and basic systems development. To ensure that we support effective public health action there is a requirement for an informed understanding of both the data and the topic area, underpinned by a sound scientific interpretation of the evidence. Such evidence must frequently be transformed from raw data into consumable information before it can be used for making decisions, determining policy, and conducting and evaluating public health programmes. High-quality accurate public health data support the development of public health policy, strategy, development and introduction of public health programmes and ultimately improve health outcomes. This aim of this module is to enable the trainee to develop their knowledge and understanding of data management and apply their skills to ensure that data are governed appropriately and managed in accordance with legislative and good practice guidelines.  

  1. Document and design a specification for a relational database for collecting or storing health data, ensuring compliance with security, governance and ethical issues.
  2. Extract, import and manipulate data within a data set.
  3. Draft a report summarising the quality of the data, make recommendations required to improve the data quality and agree an action plan.
Number Work-based learning outcome Title Knowledge
1 1

Liaise with an information manager to identify how a database specification should be written.

2 1

Define the user requirements for the relational database, including purpose, scope, structure and security.

3 1

Identify the key elements of the database’s design, including relationships and keys/indexes and techniques to provide quality assurance.

4 1

Produce appropriate documentation to define the system.

5 2

Identify a database containing health or exposure data and extract the relevant fields from multiple tables within the database to answer a specific question using Structured Query Language (SQL).

6 2

Import data into software which allows further data analysis, e.g. R, SQL.

7 2

Run queries to identify quality issues, including coding anomalies, incomplete data and general data inaccuracies.

8 2

Resolve issues identified where possible.

9 2

Identify relevant fields and de-duplicate data.

10 2

Manipulate the data to produce aggregated counts.

11 3

Produce a short report describing and evaluating the quality issues identified.

12 3

Propose recommendations to resolve or mitigate the data quality issues.

13 3

Present findings to colleagues, defend the recommendations and agree an action plan.

This module has no work-based assessments.

Important information

The academic parts of this module will be detailed and communicated to you by your university. Please contact them if you have questions regarding this module and its assessments. The module titles in your MSc may not be exactly identical to the work-based modules shown in the e-portfolio. Your modules will be aligned, however, to ensure that your academic and work-based learning are complimentary.

Learning Outcomes

  1. Describe the key elements and purpose of data sharing/access agreements to protect patients and the public.
  2. Discuss security issues around client server systems and system security in the context of NHS data governance and ethical concerns.
  3. Describe the key elements of databases, including purpose, scope and application.
  4. Identify the key elements and importance of database design.
  5. Describe the principles and steps of data linkage.
  6. Manipulate, manage and quality assure data within a data set.
  7. Discuss the importance of data quality and explain how to mitigate and improve it.
  8. Interrogate an SQL database.
  9. Describe the principles of data warehousing.
  10. Summarise the concept of data mining.

Indicative Content

  • Describe the key elements and purpose of data sharing/access agreements to protect patients and the public
    • Patient confidentiality
    • Caldicott/Caldicott Guardians
    • Legal issues, e.g. legislation (Data Protection Act), Information Commissioner’s Office, impact of breaches
    • Data sharing agreements
    • NHS Information Governance Toolkit – overview, how it helps safeguard data
    • Systems security – user access controls, firewalls, encryption (s/w, h/w, database)
    • Pseudonymisation
    • System level security policy
    • Data flows diagrams
    • Risk assessment/Patient identifiable information
    • Disaster recovery/resilience
    • Secure information exchange
    • Freedom of information requirements
  • Discuss security issues around client server systems and system security in the context of NHS data governance and ethics concerns
    • Logins and access to databases, server/database roles and in-database roles (access to specific tables, etc.)
    • Database logs
    • Disaster recovery
    • Principles of information governance and be aware of the safe and effective use of health and social care information
    • Recognise and respond appropriately to situations where it is necessary to share information to safeguard service users or the wider public
    • The need to manage records and all other information in accordance with applicable legislation, protocols and guidelines
  • Describe the key elements of databases, including purpose, scope and application
    • What is a database and why are they needed?
    • Kinds of database (relational, document, graph)
    • Products – SQL Server, Oracle, Access
    • What can we do with them? (lookup, analysis, online transaction processing [OLTP], online analytical processing [OLAP], machine learning, stream analytics)
  • Describe the key elements and importance of database design
    • Creating the database – considerations on size, logical structure for data storage
    • Structure – tables, views, ioins, primary/foreign keys
    • Field types
    • Nulls and empty strings – how do we deal with missing data
    • Normalisation (to save space and not repeat data, primary/foreign keys)
    • Indexes
    • Importance of documentation
    • Importance of using standardised coding
  • Describe the principles and steps of data linkage
    • What it is, why we need to do it and principles
    • Common problems
  • Manipulate, manage and quality assure data within a data set
    • Variables – types, numeric formats, decimals, date and time, string
    • Getting data into and out of programmes
    • Documentation commands – labels
    • Calculations – generate and replace, recoding, checking correctness, missing data
    • Data structure – selecting observations and variables, renaming and reordering, sorting, collapsing data, combining files
    • Data entry – folders, filenames, variable names, error prevention
    • Approaches and methods to ensure reproducibility of analyses and outputs
  • Data quality
    • Principles: completeness, accuracy, validity, accuracy, timeliness, consistency
    • Standards
    • Data validation
    • Data dictionaries, standards and coding
  • Interrogate an SQL database
  • Using a dummy data set
  • Importing data
    • Import and Export Wizard
    • SQL insert, update, delete
    • Querying the data
    • SQL Management Studio
    • Select, Where, Group by, Order by, Count
    • Joins
    • Formats and conversion
    • Handling nulls
    • Query efficiency
    • ODBC – Access, Excel, EpiData, R, etc.
  • Describe the principles of data warehousing
    • Definition of data warehouse
    • Application
    • Process: data cleaning, data integration and data consolidations
  • Summarise the concept of data mining
    • Use and application of data mining
    • Data mining tools
    • Steps in data mining: change detection, dependency modelling, clustering, classification, regression, summarisation, results validation