# Seminar with John Hawkins

When: Tuesday, 15th March, 10am

Where: 78-420 (Rm420, GPSouth Building)

Speaker: John Hawkins

Title: Recurrent Neural Network Architectures for Predicting the Fates

of Proteins

Abstract:

The selection of machine learning techniques requires a certain

sensitivity to the requirements of the problem. In particular the

problem can be made more tractable by deliberately using algorithms that

are biased towards solutions of the requisite kind. The central

hypothesis of this thesis will be that recurrent architectures have a

natural bias towards the general problem domain of which biological

sequence tasks are a subset. The central goal of the project will be to

gather evidence for this hypothesis by applying recurrent networks to

problems of classifying protein fates. A case study performed on the

prediction of protein subcellular localisation indicated that not only

are recurrent networks suitable to the task, but that as the patterns

within the sequence become more ambiguous, the choice of specific

recurrent architecture becomes more critical. Thus a subsidiary

hypothesis emerged that by refining the recurrent architecture it is

possible to tune it to the specific demands of the problem. Recurrent

neural networks have the added benefit that they are amenable to a

number of post training analysis techniques, particularly finite state

automata extraction and dynamical analysis of state node activations.

These techniques allow for a greater insight into both the nature

problem and the manner in which it was solved than most machine learning

approaches provide.

Thus, the final objective of the project will be to use the successful

classifiers to perform an analysis of the relationship between the

machines and the problem, to draw out insights into the corresponding

biological processes

Where: 78-420 (Rm420, GPSouth Building)

Speaker: John Hawkins

Title: Recurrent Neural Network Architectures for Predicting the Fates

of Proteins

Abstract:

The selection of machine learning techniques requires a certain

sensitivity to the requirements of the problem. In particular the

problem can be made more tractable by deliberately using algorithms that

are biased towards solutions of the requisite kind. The central

hypothesis of this thesis will be that recurrent architectures have a

natural bias towards the general problem domain of which biological

sequence tasks are a subset. The central goal of the project will be to

gather evidence for this hypothesis by applying recurrent networks to

problems of classifying protein fates. A case study performed on the

prediction of protein subcellular localisation indicated that not only

are recurrent networks suitable to the task, but that as the patterns

within the sequence become more ambiguous, the choice of specific

recurrent architecture becomes more critical. Thus a subsidiary

hypothesis emerged that by refining the recurrent architecture it is

possible to tune it to the specific demands of the problem. Recurrent

neural networks have the added benefit that they are amenable to a

number of post training analysis techniques, particularly finite state

automata extraction and dynamical analysis of state node activations.

These techniques allow for a greater insight into both the nature

problem and the manner in which it was solved than most machine learning

approaches provide.

Thus, the final objective of the project will be to use the successful

classifiers to perform an analysis of the relationship between the

machines and the problem, to draw out insights into the corresponding

biological processes