IT for Advanced Bioinformatics Applications (SBI126)

10 credits

Aim of this module

The volume of data being generated by new functional genomics and NGS methodologies is unprecedented in medicine. The challenges of being able to capture and integrate this data effectively such that it can be used effectively require solutions beyond those that have typically been used in clinical medicine. The trainee will be introduced to modern computational methodologies for handling and integrating large data. This will involve them in developing a good understanding of data description standards (through ontologies) and data federation methodologies. Workflow systems will be introduced as tools for industrial-scale bioinformatics analyses, as well as a discussion of cloud-based computer solutions for extending the compute resource available within the NHS. A strong focus will be placed on the ethical and governance issues raised by using such technologies within an NHS setting. In this module trainees will be expected to use computational methodologies for handling and integrating large data in accordance with data description standards (through ontologies) and data federation standards. They will be expected to use and design dynamic systems as tools for industrial-scale bioinformatics analyses within the ethical and governance issues raised by using such technologies within an NHS setting from the perspective of the patient, the clinical department and the organisation.

  1. Identify a clinical and/or laboratory bioinformatics requirement and develop, validate and deploy a bespoke workflow for clinical or public health analysis.
Number Work-based learning outcome Title Knowledge
1 1

Identify a task capable of automated workflow analysis.

2 1

Carry out a feasibility study, discussing user requirements, and develop a detailed requirements specification with key stakeholders, and validate and gain authorisation.

3 1

Perform, evaluate and present an options appraisal of technology and resource options.

4 1

Develop, test and evaluate the workflow and perform user acceptance procedures.

5 1

Deploy the workflow, including maintenance and upgrading, ensuring compliance with quality assurance procedures, including version control.

6 1

Finalise workflow documentation and file in accordance with local standard operating procedures.

7 1

Train workflow users and assess the effectiveness of the training.

You must complete
2 Case-based discussion(s)
2 of the following DOPS / OCEs
Assessment Title Type
Demonstrate the data backup procedure in accordance with departmental protocols DOPS
Demonstrate techniques checksum etc for testing the integrity of data transfer DOPS
Organise and oversee deployment of a software update. DOPS
Create a secure backup of an NGS dataset ensuring data integrity DOPS
Organise and oversee deployment of software. DOPS
Present a strategy for an identifiable clinical bioinformatics requirement to a team of professionals OCE

Important information

The academic parts of this module will be detailed and communicated to you by your university. Please contact them if you have questions regarding this module and its assessments. The module titles in your MSc may not be exactly identical to the work-based modules shown in the e-portfolio. Your modules will be aligned, however, to ensure that your academic and work-based learning are complimentary.

Learning Outcomes

  1. Describe basic cloud computing infrastructure.
  2. Describe the philosophy behind minimum information standards used to capture functional genomics data.
  3. Describe international data repositories for genetic and functional genomics data.
  4. Discuss the basic principles of ontologies for describing metadata.
  5. Describe the use of ontologies for capturing disease phenotype information.
  6. Discuss strategies for genetic data analysis over large-scale heterogeneous data.
  7. Describe a range of modern computational workflow systems.
  8. Discuss the application of workflow systems to NGS analysis.
  9. Discuss issues of data quality in medicine.
  10. Discuss the importance of data quality for patient safety.
  11. Describe the ethical and governance regulations relating to data capture in the NHS.
  12. Describe the ethical and governance concerns regarding data integration in the NHS.
  13. Describe basic principles of data encryption and international data encryption standards in medicine.
  14. Discuss the importance of information governance for patient safety.

Indicative Content

Computational infrastructure

  • Data encryption and data encryption standards
  • Governance and security issues for large data in the NHS
  • Basic cloud computing architectures (software as service, compute as service, etc.).
  • Public and private cloud architectures (including commercial systems such as Azure and EC2)
  • A basic introduction to workflows in computer science.
  • An introduction to workflow tools (Taverna, Galaxy, etc.) 

Functional genomics and genomics data sets

  • The concept of metadata
  • The role of minimum information standards to allow effective sharing
  • Tools to capture minimal information data (XML)
  • An introduction to ontologies
  • Community annotation through ontology
  • Interoperating with ontologies
  • Strategies for large-scale data integration
  • The pros and cons of data warehouses versus data integration over distributed heterogeneous data
  • Examples of ontology-driven data integration
  • Examples of data warehouses for genomic integration (Ensembl) 


  • The basic theory of computational workflows
  • The architecture of workflow systems
  • Examples of workflows in genetics (Galaxy assembly of NGS data)
  • Analysis of current literature and data integration and workflows in genetics and medicine