If you must choose one single idea to work on for the rest of your life, what will that be ?
In 2003, my grandma passed away with severe dementia. At the time, there was no existing technology to help communicating her daily needs, or her whereabouts for the carer to attend to swiftly. It was a struggle for my family to provide her the end-of-life care that she deserved.
For the past decade, my research has been around navigation and tracking, with an emphasis on contact tracing and healthcare monitoring. I have developed novel, practical systems combining the ubiquitous mobile sensors with machine learning.
Since 2017, I have published on average 3 papers, journals a year, most of them as the lead author. See below for a full list of my publications.
(Winner of the Best Paper Award)
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks.
We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.
Support Vector Machine (SVM) is a powerful paradigm that has proven to be extremely useful for the task of classifying high-dimensional objects. In principle, SVM allows us to train scoring classifiers those that output a prediction score; however, it can also be adapted to produce probability-type outputs through the use of the Venn-Abers framework. This allows us to obtain valuable information on the labels distribution for each test object. This procedure, however, is restricted to very small data given its inherent computational complexity.
We circumvent this limitation by borrowing results from the field of computational geometry. Specifically, we make use of the concept of a coreset: a small summary of data that is constructed by discretising the feature space into enclosing balls, so that each ball will be represented by only one point.
Our results indicate that training Venn-Abers predictors using enclosing balls provides an average acceleration of 8 times compared to the regular Venn-Abers approach while largely retaining probability calibration. These stimulating results imply that we can still enjoy well-calibrated probabilistic outputs for kernel SVM even in the realm of large-scale datasets.
The continual proliferation of mobile devices has encouraged much effort in using the smartphones for indoor positioning.
This article is dedicated to review the most recent and interesting smartphones based indoor navigation systems, ranging from electromagnetic to inertia to visible light ones, with an emphasis on their unique challenges and potential real-world applications.
A taxonomy of smartphones sensors will be introduced, which serves as the basis to categorise different positioning systems for reviewing. A set of criteria to be used for the evaluation purpose will be devised. For each sensor category, the most recent, interesting and practical systems will be examined, with detailed discussion on the open research questions for the academics, and the practicality for the potential clients.
(Impact factor: 1.36; H-index: 22)
Contact tracing is widely considered as an effective procedure in the fight against epidemic diseases. However, one of the challenges for technology based contact tracing is the high number of false positives, questioning its trust-worthiness and efficiency amongst the wider population for mass adoption.
To this end, this paper proposes a novel, yet practical smartphone based contact tracing approach, employing WiFi and acoustic sound for relative distance estimate, in addition to the ambient air pressure and magnetic field environment matching. We present a model combining 6 smartphone sensors, prioritising some of them when certain conditions are met.
We empirically verified our approach in various realistic environments to demonstrate an achievement of up to 95% fewer false positives, and 62% more accurate than Bluetooth-only system. To the best of our knowledge, this paper was one of the first work to propose a combination of smartphone sensors for contact tracing.
(Impact factor: 3.43; H-index: 132)
Passengers travelling on the London underground tubes currently have no means of knowing their whereabouts between stations. The challenge for providing such service is that the London underground tunnels have no GPS, WiFi, Bluetooth or any kind of terrestrial signals to leverage.
This paper presents a novel, yet practical idea to track passengers in realtime using the smartphone accelerometer and a training database of the entire London underground network. Our rationales are that London tubes are self-driving transports with predictable accelerations, decelerations and travelling time, that they always travel on the same fixed rail lines between stations with distinctive bumps and vibrations, which permit us to generate an accelerometer map of the tubes' movements on each line. Given the passenger's accelerometer data, we identify in realtime what line they are travelling on, and what station they depart from, using pattern matching algorithm, with an accuracy of up to about 90% when the sampling length is equivalent to at least 3 station stops. We incorporate Principal Component Analysis to perform inertial tracking of passenger's position along the line, when trains break away from scheduled movements during rush hours.
Our proposal was painstakingly assessed on the entire London underground covering approximately 940 kilometres of travelling distance, spanning across 381 stations on 11 different lines.
We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer's footprints. The merits or otherwise of explicitly permissioned location sensors are not the point of this paper. Instead, our proposition is that other non-location-sensitive sensors can track users accurately when the users are in motion, as in travelling on public transport, such as trains, buses, and taxis.
Through field trials, we provide evidence that high accuracy location tracking can be achieved even via non-location-sensitive sensors for which no access authorisation is required from users on a smartphone.
As the volume of data increase rapidly, most traditional machine learning algorithms become computationally prohibitive. Furthermore, the available data can be so big that a single machine's memory can easily be overflown.
We propose Coreset-Based Conformal Prediction, a strategy for dealing with big data by applying conformal predictors to a weighted summary of data - namely the coreset. We compare our approach against stand-alone inductive conformal predictors over three large competition-grade datasets to demonstrate that our coreset-based strategy may not only significantly improve the learning speed, but also retains predictions validity and the predictors' efficiency.
Cough and sneeze are the most common means to spread respiratory diseases amongst humans. Existing approaches to detect coughing and sneezing events are either intrusive or do not provide any reliability measure.
This paper offers a novel proposal to reliably and non-intrusively detect such events using a smartwatch as the underlying hardware, Conformal Prediction as the underlying software.
We rigorously analysed the performances of our proposal with the Harvard ESC Environmental Sound dataset, and real coughing samples taken from a smartwatch in different ambient noises.
This report describes the work in progress, analysing ExCAPE data on possibility of multitarget learning.
We start with observing the structure of missing values (labels), as the sets of examples overlap but are not identical for different targets.
Then we concentrate on the part of the data with full information in order to consider mutual dependence between the targets, and possibility of improvement of prediction by collecting the information together.
This report summaries the performance of Mondrian Inductive Conformal Prediction on the top 10 largest targets in the ExCAPE dataset.
Out of 526 targets in the ExCAPE database, only the top 150 have more than 100,000 compounds. The number of compounds drops below 5,000 beyond the top 200.
(Nominated for the Best Paper Award)
The public transports provide an ideal means to enable contagious diseases transmission.
This paper introduces a novel idea to detect co-location of people in such environment using just the ubiquitous geomagnetic field sensor on the smart phone. Essentially, given that all passengers must share the same journey between at least two consecutive stations, we have a long window to match the user trajectory.
Our idea was assessed by a painstaking survey of over 150 kilometres of travelling distance, covering different parts of London, using the overground trains, the underground tubes and the buses.
The major challenges for optical based tracking are the lighting condition, the similarity of the scene, and the position of the camera.
This paper demonstrates that under such conditions, the positioning accuracy of Google's Tango platform may deteriorate from fine-grained centimetre level to metre level.
The paper proposes a particle filter based approach to fuse the WiFi signal and the magnetic field, which are not considered by Tango, and outlines a dynamic positioning selection module to deliver seamless tracking service in these challenging environments.
(Impact factor: 0.46; H-index: 16)
WiFi fingerprinting has been a popular approach for indoor positioning in the past decade. However, most existing fingerprint-based systems were designed as an on demand service to guide the user to his wanted destination.
This article introduces a novel feature that allows the positioning system to predict in advance which walking route the user may use, and the potential destination. To achieve this goal, a new so-called routine database will be used to maintain the magnetic field strength in the form of the training sequences to represent the walking trajectories. The benefit of the system is that it does not adhere to a certain predicted trajectory. Instead, the system dynamically adjusts the prediction as more data are exposed through-out the user's journey. The proposed system was tested in a real indoor environment to demonstrate that the system did not only successfully estimate the route and the destination, but also improved the single positioning prediction.
Indoor navigation provides the positioning service to the indoor users, where the GPS coverage is not available. The challenges for most signal-based indoor positioning systems are the unpredictable signal propagation caused by the complex building interiors, and the dynamic of the environment caused by the peoples' movements. However, most existing systems made no assumption about the quality of their predictions, which is crucial in such noisy indoor environment.
To address this challenge, this article proposes a confidence measure to reflect the uncertainty of the positioning prediction. More importantly, the users may control the size of the prediction set by setting the confidence level tailoring to their personal requirement. The proposed approach in this article has been validated in three real office buildings with challenging indoor environments, which indicated that it performed up to 20% more accurate than traditional Naïve Bayes and Weighted K-nearest neighbours (W-KNN) algorithms.
Indoor localisation provides the positioning service to the indoor users, where the GPS coverage is not available. Much research effort has been invested into 'Location Fingerprinting', which is considered one of the most effective indoor tracking methods to date. Fingerprint-based approaches piggyback on top of the existing indoor communication layers such as the WiFi network to provide the location-based service. However, the challenges of fingerprinting are the huge training database, the dynamic indoor environment, and the WiFi fingerprints may struggle to provide fine-grained positioning accuracy at certain indoor positions. This thesis addresses the mentioned problems using several machine learning algorithms and additional information observed from the users and the indoor environment.
The proposed approaches in this thesis have been validated in the real offices with challenging indoor environments. One test bed has multiple buildings and floors, and has been previously used in the EvAAL 2015 indoor positioning competition, which provides a relative baseline for the proposed techniques. In particular, the regression and classification algorithms in this thesis were ranked second and third out of the 5 contestants, under the same competition's test domain. In addition, they performed up to 20% more accurate than traditional Naive Bayes and W-KNN algorithms.
An epidemic may be controlled or predicted if we can monitor the history of physical human contacts. As most people have a smart phone, a contact between two persons can be regarded as a handshake between the two phones. Our task becomes how to detect the moment the two mobile phones are close.
In this paper, we investigate the possibility of using the outdoor WLAN signals, provided by public Access Points, for off-line mobile phones collision detection. Our method does not require GPS coverage, or real-time monitoring. We designed an Android app running in the phone’s background to periodically collect the outdoor WLAN signals. This data are then analysed to detect the potential contacts. We also discuss several approaches to handle the mobile phone diversity, and the WLAN scanning latency issue. Based on our measurement campaign in the real world, we conclude that it is feasible to detect the co-location of two phones with the WLAN signals only.
One of the challenges of location fingerprinting to be deployed in the real offices is the training database handling process, which does not scale well with increasing amount of tracking space to be covered. However, little attention was paid to tackle such issue, where the majority of previous work rather focused on improving the tracking accuracy.
In this paper, we propose a novel idea to enhance fingerprinting's processing speed and positioning accuracy with mixture of Gaussians clustering. We realised the key difference between fingerprinting and other un-supervised problems, that is we do know the label (the Cartesian co-ordinate) of the signal data in advance. This key information was largely ignored in previous work, where the fingerprinting clustering was based solely on the signal data information. By exploiting this information, we tackle the indoor signal multipath and shadowing with two-level signal data clustering and Cartesian co-ordinate clustering.
We tested our approach in a real office environment with harsh indoor condition, and concluded that our clustering scheme does not only reduce the fingerprinting processing time, but also improves the positioning accuracy.
Indoor localisation helps monitoring the positions of a person inside a building, without GPS coverage. In the past decade, much research effort have been invested into Indoor Fingerprinting, which is considered one of the most effective indoor tracking methods to date.
In recent years, some researches started looking at crowdsourcing the fingerprinting database with the contributions from indoor users via mobile phones or laptop PCs. However, the crowdsourcing process was greatly limited due to the lack of indoor reference, in contrast to the widespread use of GPS reference for outdoor crowdsourcing.
In this paper, we propose a novel idea to crowdsource the fingerprinting database without any preset infrastructure, landmarks, nor using any advanced sensors. Our idea is based on the observations that the users often carry a mobile phone with them, and there are multiple social contacts amongst those users indoor. First, we exploit the user's continuous movement indoor to refine the location prediction set. Our approach can be applied to enhance other systems. Second, we use a unique concept to detect the indoor social contacts with NFC by tapping the back of the two phones together. Third, we propose a novel idea to combine this social contact and the user's continuous movements to identify the exact entries with confidence in the fingerprinting database that need updating for crowdsourcing. Finally, we share our thoughts on automating the crowdsourcing process without any user input.
Current indoor localisation systems make use of common wireless signals such as Bluetooth, WiFi to track the users inside a building. Amongst those, Bluetooth has been widely known for its low-power consumption, small maintenance cost, as well as its wide-spread amongst the commodity devices. Understanding the properties of such wireless signal definitely aids the tracking system design. However, little research has been done to understand the properties of Bluetooth wireless signal amongst the current Bluetooth-based tracking systems.
In this chapter, the most important Bluetooth properties related to indoor localisation are experimentally investigated from a statistical perspective. A Bluetooth-based tracking system is proposed and evaluated with the location fingerprinting technique to incorporate the Bluetooth properties described in the chapter.
(Impact factor: 0.78; H-index: 49)
Indoor localisation is the state-of-the-art to identify and observe a moving human or an object inside a building. However, because of the harsh indoor conditions, current indoor localisation systems remain either too expensive or not accurate enough.
In this paper, we tackle the latter issue in a different direction, with a new conformal prediction algorithm to enhance the accuracy of the prediction. We handle the common indoor signal attenuation issue, which introduces errors into the training database, with a reliability measurement for our prediction. We show why our approach performs better than other solutions through empirical studies with two testbeds. To the best of our knowledge, we are the first to apply conformal prediction for the localisation purpose in general, and for the indoor localisation in particular.
We proposed the first Conformal Prediction (CP) algorithm for indoor localisation with a classification approach. The algorithm can provide a region of predicted locations, and a reliability measurement for each prediction. However, one of the shortcomings of the former approach was the individual treatment of each dimension. In reality, the training database usually contains multiple signal readings at each location, which can be used to improve the prediction accuracy.
In this paper, we enhance our former CP with the Kullback-Leibler divergence, and propose two new classification CPs. The empirical studies show that our new CPs performed slightly better than the previous CP when the resolution and density of the training database are high. However, the new CPs performs much better than the old CP when the resolution and density are low.
Indoor localisation is the state-of-the-art to identify and observe a moving human or object inside a building. Location Fingerprinting is a cost-effective software-based solution utilising the built-in wireless signal of the building to estimate the most probable position of a real-time signal data. In this paper, we apply the Conformal Prediction (CP) algorithm to further enhance the Fingerprinting method. We design a new nonconformity measure with the Weighted K-nearest neighbours (W-KNN) as the underlying algorithm. Empirical results show good performance of the CP algorithm.
The release of the Lego Mindstorms kit has carried the flexibility and creativity of Lego into the world of robotics, whilst targeting a variety of children and adults audiences. To achieve this goal, a programming language called NXT-G was developed to provide everyone full control of the Lego Mindstorms kit, regardless of their programming experience.
In this project, the programming language ambition is tested through practical experiments. In a controlled experiment, twelve participants carry out four tasks using the NXT-G software and a Lego robot. Their performances are then analysed to confirm the stated claim.
This thesis proposed and implemented a new affordable indoor tracking system. The Bluetooth signal was found to be very stable and is reliable for any indoor positioning system. The Fingerprinting method was employed to manipulate the Bluetooth signal at many positions in the office room. In addition, a robot was created to perform the complex and time-consuming data collection process.
Spam (junk-email) identification is a well-documented research area. A good spam filter is not only judged by its accuracy in identifying spam, but also by its performance.
This project aims to replicate a Naive Bayesian spam filter, as described in the "SpamCop: A Spam Classification & Organization Program" paper. The accuracy and performance of the filter are examined with the GenSpam corpus. In addition, the project investigates the actual effect of the Porter Stemming algorithm on such filter.
The thesis was one of the first to investigate the structure of the UK Chip & PIN debit and credit cards. Particularly, we looked into CAMs (Card Authentication Methods) implemented by different banks to understand their policies.
A Java-based reader was developed to exchange information with all UK Debit/Credit cards, some overseas cards were also tested. The software can simulate an ATM machine to perform off-line PIN verification.
The nature of my high impact, practical research, and being a young lecturer (am still coding, and collecting data myself) will benefit enthusiastic students who would like to apply their ideologies in the real world.
If you are interested in doing research on such topics, feel free to get in touch.
I have had the pleasure to supervise the following students.
- Nery Riquelme-Granada (2018 - present)
co-supervised with Prof. Zhiyuan Luo.
Thesis: "Coresets-based Conformal Prediction".
- Robert Choudhury (2020 - present)
co-supervised with Prof. Zhiyuan Luo.
- Diego Toledano (2019 - 2020)
Thesis: "Solving 2048 using AI".
- Marsela Gavrilova (2019 - 2020)
Thesis: "Regression algorithms for learning".
All of my M.Sc students graduated with a Distinction degree.
- Dr. Ruth Blackwell (2019)
is now a Teaching Fellow at Royal Holloway University, UK.
Thesis: "WiFi Fingerprinting: three machine learning algorithms, six distance metrics and the UJIIndoorLoc database".
- Baldeep Singh (2019)
is now a Machine Learning Engineer at Kingston University, UK.
Thesis: "Machine Learning Based Indoor Positioning".
- Daniel Tirabasso (2019)
is now a CEO at Cogmatic, UK.
Thesis: "Dimensionality Reduction for Wi-Fi Fingerprinting based Indoor Localisation Systems".
- Arif Syed (2019)
is now a Machine Learning Engineer at InforceHub, UK.
Thesis: "Conformal predictors for detecting harmful respiratory events".
- Bharat Sikka (2019)
is now a Machine Learning Engineer at Spraxa Solutions, India.
Thesis: "Machine Learning Based IndoorPositioning Using Wi-Fi Fingerprinting".