A Multi-Scale Feature Selection Framework for WiFi Access Points Line-of-sight Identification

Xu Feng, Khuong An Nguyen, Zhiyuan Luo
Conference Paper IEEE Wireless Communications and Networking Conference (WCNC 2023).


Despite its high accuracy in the ideal condition where there is a direct Line-of-Sight between the Access Points and the user, most WiFi indoor positioning systems struggle under the Non-Line-of-Sight scenario.

Thus, we propose a novel feature selection algorithm leveraging Machine Learning weighting methods and Multi-Scale selection, with WiFi RTT and RSS as the input signals.

We evaluate the algorithm performance on a campus building floor. The results indicated an accuracy of 93% Line-of-Sight detection success with 13 Access Points, using only 3 seconds of test samples at any moment; and an accuracy of 98% for individual AP detection.

A novel Deep Learning approach for one-step Conformal Prediction approximation

Julia Meister, Khuong An Nguyen, Stelios Kapetanakis, Zhiyuan Luo
Journal Article Annals of Mathematics and Artificial Intelligence | Springer.

(Impact factor: 1.154)


Deep Learning predictions with measurable confidence are increasingly desirable for real-world problems, especially in high-risk settings. The Conformal Prediction (CP) framework is a versatile solution that automatically guarantees a maximum error rate. However, CP suffers from computational inefficiencies that limit its application to large-scale datasets.

In this paper, we propose a novel conformal loss function that approximates the traditionally two-step CP approach in a single step. By evaluating and penalising deviations from the stringent expected CP output distribution, a Deep Learning model may learn the direct relationship between input data and conformal p-values.

Our approach achieves significant training time reductions up to 86% compared to to Aggregated Conformal Prediction, an accepted CP approximation variant. In terms of approximate validity and predictive efficiency, we carry out a comprehensive empirical evaluation to show our novel loss function's competitiveness with ACP on the well-established MNIST dataset.


Cough-based COVID-19 detection with audio quality clustering and confidence measure based learning

Alice Ashby, Julia Meister, Khuong An Nguyen, Zhiyuan Luo, Werner Gentzke
Conference Paper 11th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2022).

(Winner of the Best Paper Award)


COVID-19 cough classification has rapidly become a promising research avenue as an accessible and low-cost screening alternative, needing only a smartphone to collect and process cough samples. However, audio processing of recordings made in uncontrolled environments and prediction confidence are key challenges that need to be addressed before cough-screening could be widely accepted as a trusted testing method.

Therefore, we propose a novel approach for cough event detection that identifies cough clusters instead of individual coughs, significantly reducing onset detection's usual hypersensitivity to energy fluctuations between cough phases.

By using this technique to improve training sample quality and quantity by +200%, we improve Machine Learning performance on the minority COVID-19 class by up to 20%, achieving up to +47% precision and +15% recall. We propose a novel, class-agnostic Conformal Prediction non-conformity measure which takes the cough sample quality into account to counteract the variance caused by limiting segmentation to just the training set. Our Conformal Prediction model introduces uncertainty quantification to COVID-19 cough classification and achieves an additional 34% improvement to precision and recall.

Confident COVID-19 cough prediction on imbalanced data

Julia Meister, Khuong An Nguyen, Zhiyuan Luo
Poster Machine Learning for Healthcare | Institute of Physics.

(Winner of the Best Poster Award)


COVID cough data is heavily imbalanced, and it is challenging to collect more samples. Therefore, models are biased and their predictions cannot be trusted.

Thus, we propose a confidence measure for COVID-19 cough classification.

Audio feature ranking for sound-based COVID-19 patient detection

Julia Meister, Khuong An Nguyen, Zhiyuan Luo
Conference Paper 21st EPIA Conference on Artificial Intelligence | Springer.


Audio classification using breath and cough samples has recently emerged as a low-cost, non-invasive, and accessible COVID-19 screening method. However, no application has been approved for official use at the time of writing due to the stringent reliability and accuracy requirements of the critical healthcare setting.

To support the development of the Machine Learning classification models, we performed an extensive comparative investigation and ranking of 15 audio features, including less well-known ones. The results were verified on two independent COVID-19 sound datasets.

By using the identified top-performing features, we have increased the COVID-19 classification accuracy by up to 17% on the Cambridge dataset, and up to 10% on the Coswara dataset, compared to the original baseline accuracy without our feature ranking.

WiFi Access Points line-of-sight Detection for indoor positioning using the signal Round Trip Time

Xu Feng, Khuong An Nguyen, Zhiyuan Luo
Journal Article Remote Sensing | Volume 14, Issue 23 | MDPI.

(Impact factor: 5.349)


The emerging WiFi Round Trip Time measured by the IEEE 802.11mc standard promised sub-meter-level accuracy for WiFi-based indoor positioning systems, under the assumption of an ideal line-of-sight path to the user. However, most workplaces with furniture and complex interiors cause the wireless signals to reflect, attenuate, and diffract in different directions.

Therefore, detecting the non-line-of-sight condition of WiFi Access Points is crucial for enhancing the performance of indoor positioning systems. To this end, we propose a novel feature selection algorithm for non-line-of-sight identification of the WiFi Access Points.

Using the WiFi Received Signal Strength and Round Trip Time as inputs, our algorithm employs multi-scale selection and Machine Learning-based weighting methods to choose the most optimal feature sets. We evaluate the algorithm on a complex campus WiFi dataset to demonstrate a detection accuracy of 93% for all 13 Access Points using 34 out of 130 features and only 3 s of test samples at any given time. For individual Access Point line-of-sight identification, our algorithm achieved an accuracy of up to 98%. Finally, we make the dataset available publicly for further research.

Assessing long-term medical remanufacturing emissions with Life Cycle Analysis

Julia Meister, Jack Sharp, Yan Wang, Khuong An Nguyen
Journal Article Processes | Special Issue on "Green Manufacturing and Sustainable Supply Chain Management" | MDPI.

(Impact factor: 3.352)


The unsustainable take-make-dispose linear economy prevalent in healthcare contributes 4.4% to global Greenhouse Gas emissions. A popular but not yet widely-embraced solution is to remanufacture common single-use medical devices like electrophysiology catheters, significantly extending their lifetimes by enabling a circular life cycle.

To support the adoption of catheter remanufacturing, we propose a comprehensive emission framework and carry out a holistic evaluation of virgin manufactured and remanufactured carbon emissions with Life Cycle Analysis (LCA). We followed ISO modelling standards and NHS reporting guidelines to ensure industry relevance.

We conclude that remanufacturing may lead to a reduction of up to 60% per turn (−1.92 kg CO2eq, burden-free) and 57% per life (−1.87 kg CO2eq, burdened). Our extensive sensitivity analysis and industry-informed buy-back scheme simulation revealed long-term emission reductions of up to 48% per remanufactured catheter life (−1.73 kg CO2eq). Our comprehensive results encourage the adoption of electrophysiology catheter remanufacturing, and highlight the importance of estimating long-term emissions in addition to traditional emission metrics.

An analysis of the properties and the performance of WiFi RTT for indoor positioning in non-line-of-sight environments

Xu Feng, Khuong An Nguyen, Zhiyuan Luo
Conference Paper 17th International Conference on Location Based Services (LBS 2022).


Indoor positioning system based on WiFi Round-Trip Time (RTT) measurement is believed to deliver sub-metre level accuracy with trilateration, under ideal indoor conditions. However, the performance of WiFi RTT positioning in complex, non-line-of-sight environments re-mains a research challenge.

To this end, this paper investigates the properties of WiFi RTT in several real-world indoor environments on heterogeneous smartphones. We present a large-scale real-world dataset containing both RTT and received signal strength (RSS) signal measures with correct ground-truth labels.

Our results indicated that RTT fingerprinting system delivered an accuracy below 0.75 m which was 98% better than RSS fingerprinting and 166% better than RTT trilateration, which failed to deliver sub-metre accuracy as claimed.

A novel cough audio segmentation framework for COVID-19 detection

Alice Ashby, Julia Meister, Goran Soldar, Khuong An Nguyen
Conference Paper Symposium on Open Data and Knowledge for a Post-Pandemic Era (ODAK 2022).


Despite its potential, Machine Learning has played little role in the present pandemic, due to the lack of data (i.e., there were not many COVID-19 samples in the early stage).

Thus, this paper proposes a novel cough audio segmentation framework that may be applied on top of existing COVID-19 cough datasets to increase the number of samples, as well as filtering out noises and uninformative data. We demonstrate the efficiency of our framework on two popular open datasets.

Communication-efficient Conformal Prediction for distributed datasets

Nery Riquelme-Granada, Zhiyuan Luo, Khuong An Nguyen
Extended Abstract 11th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2022).


Coresets have been proven useful in accelerating the computation of inductive conformal predictors (ICP) when the training data becomes large in size.

This work shows that coreset-based conformal predictors are not only computationally efficient in the centralised setting, but may also naturally be used in scenarios where the dataset of interested in inherently distributed.

Malware in motion

Robert Choudhury, Zhiyuan Luo, Khuong An Nguyen
Conference Paper 8th International Conference on Information Systems Security and Privacy (ICISSP 2022) | Springer.


Malicious software (malware) is designed to circumvent the security policy of the host device. Smartphones represent an attractive target to malware authors as they are often a rich source of sensitive information. Attractive targets for attackers are sensors (such as cameras or microphones) which allow observation of the victims in real time.

To counteract this threat, there has been a tightening of privileges on mobile devices with respect to sensors, with app developers being required to declare which sensors they need access to, as well as the users needing to give consent.

We demonstrate by conducting a survey of publicly accessible malware analysis platforms that there are still implementations of sensors which are trivial to detect without exposing the malicious intent of a program. We also show how that, despite changes to the permission model, it is still possible to fingerprint an analysis environment even when the analysis is carried using a physical device with the novel use of Android's Activity Recognition API.

Preface for the Proceedings of Machine Learning Research (Volume 179)

Ulf Johansson, Henrik Boström, Khuong An Nguyen, Zhiyuan Luo, Lars Carlsson
Preface Proceedings of Machine Learning Research (Volume 179).


This volume contains the Proceedings of the Eleventh Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2022), hosted by University of Brighton, UK. The Symposium is held in Brighton on August 24–26, 2022.

Overall, 17 full papers have been accepted for publication in the Proceedings of Machine Learning Research, Volume 179.


A review of smartphones based indoor positioning: challenges and applications

Khuong An Nguyen, Zhiyuan Luo, Guang Li, Chris Watkins
Journal Article IET Cyber-systems and Robotics | Volume 3, Issue 1 | Wiley.


The continual proliferation of mobile devices has encouraged much effort in using the smartphones for indoor positioning.

This article is dedicated to review the most recent and interesting smartphones based indoor navigation systems, ranging from electromagnetic to inertia to visible light ones, with an emphasis on their unique challenges and potential real-world applications.

A taxonomy of smartphones sensors will be introduced, which serves as the basis to categorise different positioning systems for reviewing. A set of criteria to be used for the evaluation purpose will be devised. For each sensor category, the most recent, interesting and practical systems will be examined, with detailed discussion on the open research questions for the academics, and the practicality for the potential clients.

A survey of deep learning approaches for WiFi-based indoor positioning

Xu Feng, Khuong An Nguyen, Zhiyuan Luo
Journal Article Journal of Information and Telecommunication | Volume 6, Issue 3 | Taylor & Francis.

(Impact factor: 0.34)


One of the most popular approaches for indoor positioning is WiFi fingerprinting, which has been intrinsically tackled as a traditional machine learning problem since the beginning, achieving a few meters of accuracy on average.

In recent years, deep learning has emerged as an alternative approach, with a large number of publications reporting sub-meter positioning accuracy.

Therefore, this survey presents a timely, comprehensive review of the most interesting deep learning methods being used for WiFi fingerprinting. In doing so, we aim to identify the most efficient neural networks, under a variety of positioning evaluation metrics for different readers.

Coreset-based data compression for logistic regression

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo
Book Chapter Data Management Technologies and Applications | Springer | Pages 195-222 | ISBN: 978-3-030-83013-7.


The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although coresets are used as an acceleration technique for many learning problems, the algorithms used for constructing them may become computationally exhaustive in some settings. We show that this can easily happen when computing coresets for learning a logistic regression classifier. We overcome this issue with two straightforward methods: Accelerating Clustering via Sampling (ACvS) and Regressed Data Summarisation Framework (RDSF); the former is an acceleration procedure based on a simple theoretical observation on using Uniform Random Sampling for clustering problems, the latter is a coreset-based data-summarising framework that builds on ACvS and extend it by using a regression algorithm as part of the coreset construction.

We tested both procedures on five public datasets, and observed that computing the coreset and learning from it is 11 times faster than learning directly from the full input data in the worst case, and 34 times faster in the best case. We further observed that the best regression algorithm for creating summaries of data using the RDSF framework is the Ordinary Least Squares (OLS).

Confidence Machine Learning for cutting tool life prediction

Nishant Wilson, Steve Barwick, Vince Booker, Tom Mildenhall, Laura Still, Yan Wang, Khuong An Nguyen
Extended Abstract 10th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2021).


The work aims to develop an automatic cutting tool life prediction model for die-cuts machine at Parafix Ltd. Such model will be able to estimate how long a given tool is likely to last, in order to improve performance and productivity.

This work is part of the KTP project between Parafix Ltd and University of Brighton.

Preface for the Proceedings of Machine Learning Research (Volume 152)

Lars Carlsson, Zhiyuan Luo, Giovanni Cherubin, Khuong An Nguyen
Preface Proceedings of Machine Learning Research (Volume 152).


This volume contains the Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2021), co-organised by Royal Holloway, University of London, and University of Brighton, UK. This year the Symposium is held online on September 8–10, 2021. This due to the ongoing Covid-19 pandemic. For general information about conformal prediction and its sister methods, see the preface to the Proceedings of COPA 2017 (volume 60 of the PMLR), Proceedings of COPA 2018 (volume 91 of the PMLR), Proceedings of COPA 2019 (volume 105 of the PMLR), and Proceedings of COPA 2020 (volume 128 of the PMLR).

Overall, 15 papers have been accepted for publication in the Proceedings of Machine Learning Research, and an additional paper describing the tool Orange that is covered in the tutorials.


On generating efficient data summaries for logistic regression: A coreset-based approach

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo
Conference Paper 9th International Conference on Data Science, Technology and Applications (DATA 2020).

(Winner of the Best Paper Award)


In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks.

We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.

Fast probabilistic prediction for kernel SVM via enclosing balls

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo
Conference Paper 9th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2020).


Support Vector Machine (SVM) is a powerful paradigm that has proven to be extremely useful for the task of classifying high-dimensional objects. In principle, SVM allows us to train scoring classifiers those that output a prediction score; however, it can also be adapted to produce probability-type outputs through the use of the Venn-Abers framework. This allows us to obtain valuable information on the labels distribution for each test object. This procedure, however, is restricted to very small data given its inherent computational complexity.

We circumvent this limitation by borrowing results from the field of computational geometry. Specifically, we make use of the concept of a coreset: a small summary of data that is constructed by discretising the feature space into enclosing balls, so that each ball will be represented by only one point.

Our results indicate that training Venn-Abers predictors using enclosing balls provides an average acceleration of 8 times compared to the regular Venn-Abers approach while largely retaining probability calibration. These stimulating results imply that we can still enjoy well-calibrated probabilistic outputs for kernel SVM even in the realm of large-scale datasets.

Epidemic contact tracing with smartphone sensors

Khuong An Nguyen, Zhiyuan Luo, Chris Watkins
Journal Article Journal of Location Based Services | Special issue on Contact-Tracing, Apps and Location-based data for the COVID-19 Pandemic | Taylor & Francis.

(Impact factor: 1.36)


Contact tracing is widely considered as an effective procedure in the fight against epidemic diseases. However, one of the challenges for technology based contact tracing is the high number of false positives, questioning its trust-worthiness and efficiency amongst the wider population for mass adoption.

To this end, this paper proposes a novel, yet practical smartphone based contact tracing approach, employing WiFi and acoustic sound for relative distance estimate, in addition to the ambient air pressure and magnetic field environment matching. We present a model combining 6 smartphone sensors, prioritising some of them when certain conditions are met.

We empirically verified our approach in various realistic environments to demonstrate an achievement of up to 95% fewer false positives, and 62% more accurate than Bluetooth-only system. To the best of our knowledge, this paper was one of the first work to propose a combination of smartphone sensors for contact tracing.


Realtime tracking of passengers on the London underground transport by matching smartphone accelerometer footprints

Khuong An Nguyen, You Wang, Guang Li, Zhiyuan Luo, Chris Watkins
Journal Article Sensors (2019) | Special issue in Smart City and Smart Infrastructure | MDPI.

(Impact factor: 3.43)


Passengers travelling on the London underground tubes currently have no means of knowing their whereabouts between stations. The challenge for providing such service is that the London underground tunnels have no GPS, WiFi, Bluetooth or any kind of terrestrial signals to leverage.

This paper presents a novel, yet practical idea to track passengers in realtime using the smartphone accelerometer and a training database of the entire London underground network. Our rationales are that London tubes are self-driving transports with predictable accelerations, decelerations and travelling time, that they always travel on the same fixed rail lines between stations with distinctive bumps and vibrations, which permit us to generate an accelerometer map of the tubes' movements on each line. Given the passenger's accelerometer data, we identify in realtime what line they are travelling on, and what station they depart from, using pattern matching algorithm, with an accuracy of up to about 90% when the sampling length is equivalent to at least 3 station stops. We incorporate Principal Component Analysis to perform inertial tracking of passenger's position along the line, when trains break away from scheduled movements during rush hours.

Our proposal was painstakingly assessed on the entire London underground covering approximately 940 kilometres of travelling distance, spanning across 381 stations on 11 different lines.

Location tracking using smartphone accelerometer and magnetometer traces

Khuong An Nguyen, Raja Naeem Akram, Konstantinos Markantonakis, Zhiyuan Luo, Chris Watkins
Conference Paper 14th International Conference on Availability, Reliability and Security (ARES 2019).


We demonstrate a breach in smartphone location privacy through the accelerometer and magnetometer's footprints. The merits or otherwise of explicitly permissioned location sensors are not the point of this paper. Instead, our proposition is that other non-location-sensitive sensors can track users accurately when the users are in motion, as in travelling on public transport, such as trains, buses, and taxis.

Through field trials, we provide evidence that high accuracy location tracking can be achieved even via non-location-sensitive sensors for which no access authorisation is required from users on a smartphone.

Coreset-based Conformal Prediction for large-scale learning

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo
Conference Paper 8th Symposium on Conformal and Probabilistic Prediction with Applications (COPA 2019).


As the volume of data increase rapidly, most traditional machine learning algorithms become computationally prohibitive. Furthermore, the available data can be so big that a single machine's memory can easily be overflown.

We propose Coreset-Based Conformal Prediction, a strategy for dealing with big data by applying conformal predictors to a weighted summary of data - namely the coreset. We compare our approach against stand-alone inductive conformal predictors over three large competition-grade datasets to demonstrate that our coreset-based strategy may not only significantly improve the learning speed, but also retains predictions validity and the predictors' efficiency.


Cover your cough: Detection of respiratory events with confidence using a smartwatch

Khuong An Nguyen, Zhiyuan Luo
Conference Paper Proceedings of Machine Learning Research, Volume 91: Conformal and Probabilistic Prediction and Applications.


Cough and sneeze are the most common means to spread respiratory diseases amongst humans. Existing approaches to detect coughing and sneezing events are either intrusive or do not provide any reliability measure.

This paper offers a novel proposal to reliably and non-intrusively detect such events using a smartwatch as the underlying hardware, Conformal Prediction as the underlying software.

We rigorously analysed the performances of our proposal with the Harvard ESC Environmental Sound dataset, and real coughing samples taken from a smartwatch in different ambient noises.

Multi-target learning

Ilia Nouretdinov, Khuong An Nguyen, Alex Gammerman
Technical Report Part of the ExCAPE project.


This report describes the work in progress, analysing ExCAPE data on possibility of multitarget learning.

We start with observing the structure of missing values (labels), as the sets of examples overlap but are not identical for different targets.

Then we concentrate on the part of the data with full information in order to consider mutual dependence between the targets, and possibility of improvement of prediction by collecting the information together.

Performance analysis of Mondrian Conformal Prediction for the top 10 targets in the ExCAPE dataset

Khuong An Nguyen
Technical Report Part of the ExCAPE project.


This report summaries the performance of Mondrian Inductive Conformal Prediction on the top 10 largest targets in the ExCAPE dataset.

Out of 526 targets in the ExCAPE database, only the top 150 have more than 100,000 compounds. The number of compounds drops below 5,000 beyond the top 200.


Co-location epidemic tracking on London public transports using low power mobile magnetometer

Khuong An Nguyen, Chris Watkins, Zhiyuan Luo
Conference Paper 8th International Conference on Indoor Positioning and Indoor Navigation (IPIN 2017).

(Nominated for the Best Paper Award)


The public transports provide an ideal means to enable contagious diseases transmission.

This paper introduces a novel idea to detect co-location of people in such environment using just the ubiquitous geomagnetic field sensor on the smart phone. Essentially, given that all passengers must share the same journey between at least two consecutive stations, we have a long window to match the user trajectory.

Our idea was assessed by a painstaking survey of over 150 kilometres of travelling distance, covering different parts of London, using the overground trains, the underground tubes and the buses.

On assessing the positioning accuracy of Google Tango in challenging indoor environments

Khuong An Nguyen, Zhiyuan Luo
Conference Paper 8th International Conference on Indoor Positioning and Indoor Navigation (IPIN 2017).


The major challenges for optical based tracking are the lighting condition, the similarity of the scene, and the position of the camera.

This paper demonstrates that under such conditions, the positioning accuracy of Google's Tango platform may deteriorate from fine-grained centimetre level to metre level.

The paper proposes a particle filter based approach to fuse the WiFi signal and the magnetic field, which are not considered by Tango, and outlines a dynamic positioning selection module to deliver seamless tracking service in these challenging environments.

Dynamic route prediction with the magnetic field strength for indoor positioning

Khuong An Nguyen, Zhiyuan Luo
Journal Article International Journal of Wireless and Mobile Computing (2017) | Volume 12, Issue 1.

(Impact factor: 0.46)


WiFi fingerprinting has been a popular approach for indoor positioning in the past decade. However, most existing fingerprint-based systems were designed as an on demand service to guide the user to his wanted destination.

This article introduces a novel feature that allows the positioning system to predict in advance which walking route the user may use, and the potential destination. To achieve this goal, a new so-called routine database will be used to maintain the magnetic field strength in the form of the training sequences to represent the walking trajectories. The benefit of the system is that it does not adhere to a certain predicted trajectory. Instead, the system dynamically adjusts the prediction as more data are exposed through-out the user's journey. The proposed system was tested in a real indoor environment to demonstrate that the system did not only successfully estimate the route and the destination, but also improved the single positioning prediction.

A performance guaranteed indoor positioning system using conformal prediction and the WiFi signal strength

Khuong An Nguyen
Journal Article Journal of Information and Telecommunication | Volume 1, Issue 1 | Taylor & Francis.


Indoor navigation provides the positioning service to the indoor users, where the GPS coverage is not available. The challenges for most signal-based indoor positioning systems are the unpredictable signal propagation caused by the complex building interiors, and the dynamic of the environment caused by the peoples' movements. However, most existing systems made no assumption about the quality of their predictions, which is crucial in such noisy indoor environment.

To address this challenge, this article proposes a confidence measure to reflect the uncertainty of the positioning prediction. More importantly, the users may control the size of the prediction set by setting the confidence level tailoring to their personal requirement. The proposed approach in this article has been validated in three real office buildings with challenging indoor environments, which indicated that it performed up to 20% more accurate than traditional Naïve Bayes and Weighted K-nearest neighbours (W-KNN) algorithms.


Machine learning based WiFi location fingerprinting

Khuong An Nguyen
Thesis Ph.D Thesis, University of London.


Indoor localisation provides the positioning service to the indoor users, where the GPS coverage is not available. Much research effort has been invested into 'Location Fingerprinting', which is considered one of the most effective indoor tracking methods to date. Fingerprint-based approaches piggyback on top of the existing indoor communication layers such as the WiFi network to provide the location-based service. However, the challenges of fingerprinting are the huge training database, the dynamic indoor environment, and the WiFi fingerprints may struggle to provide fine-grained positioning accuracy at certain indoor positions. This thesis addresses the mentioned problems using several machine learning algorithms and additional information observed from the users and the indoor environment.

The proposed approaches in this thesis have been validated in the real offices with challenging indoor environments. One test bed has multiple buildings and floors, and has been previously used in the EvAAL 2015 indoor positioning competition, which provides a relative baseline for the proposed techniques. In particular, the regression and classification algorithms in this thesis were ranked second and third out of the 5 contestants, under the same competition's test domain. In addition, they performed up to 20% more accurate than traditional Naive Bayes and W-KNN algorithms.


On the feasibility of using two mobile phones and WLAN signal to detect co-location of two users for epidemic prediction

Khuong An Nguyen, Zhiyuan Luo, Chris Watkins
Book ChapterProgress in Location-Based Services (2014) | Springer | Pages 63-78 | ISBN: 978-3-319-11878-9.


An epidemic may be controlled or predicted if we can monitor the history of physical human contacts. As most people have a smart phone, a contact between two persons can be regarded as a handshake between the two phones. Our task becomes how to detect the moment the two mobile phones are close.

In this paper, we investigate the possibility of using the outdoor WLAN signals, provided by public Access Points, for off-line mobile phones collision detection. Our method does not require GPS coverage, or real-time monitoring. We designed an Android app running in the phone’s background to periodically collect the outdoor WLAN signals. This data are then analysed to detect the potential contacts. We also discuss several approaches to handle the mobile phone diversity, and the WLAN scanning latency issue. Based on our measurement campaign in the real world, we conclude that it is feasible to detect the co-location of two phones with the WLAN signals only.


Selective mixture of Gaussians clustering for location fingerprinting

Khuong An Nguyen, Zhiyuan Luo
Conference Paper 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous 2014).


One of the challenges of location fingerprinting to be deployed in the real offices is the training database handling process, which does not scale well with increasing amount of tracking space to be covered. However, little attention was paid to tackle such issue, where the majority of previous work rather focused on improving the tracking accuracy.

In this paper, we propose a novel idea to enhance fingerprinting's processing speed and positioning accuracy with mixture of Gaussians clustering. We realised the key difference between fingerprinting and other un-supervised problems, that is we do know the label (the Cartesian co-ordinate) of the signal data in advance. This key information was largely ignored in previous work, where the fingerprinting clustering was based solely on the signal data information. By exploiting this information, we tackle the indoor signal multipath and shadowing with two-level signal data clustering and Cartesian co-ordinate clustering.

We tested our approach in a real office environment with harsh indoor condition, and concluded that our clustering scheme does not only reduce the fingerprinting processing time, but also improves the positioning accuracy.

Semi-Automatic indoor fingerprinting database crowdsourcing with continuous movements and the social contacts

Khuong An Nguyen
Technical Report


Indoor localisation helps monitoring the positions of a person inside a building, without GPS coverage. In the past decade, much research effort have been invested into Indoor Fingerprinting, which is considered one of the most effective indoor tracking methods to date.

In recent years, some researches started looking at crowdsourcing the fingerprinting database with the contributions from indoor users via mobile phones or laptop PCs. However, the crowdsourcing process was greatly limited due to the lack of indoor reference, in contrast to the widespread use of GPS reference for outdoor crowdsourcing.

In this paper, we propose a novel idea to crowdsource the fingerprinting database without any preset infrastructure, landmarks, nor using any advanced sensors. Our idea is based on the observations that the users often carry a mobile phone with them, and there are multiple social contacts amongst those users indoor. First, we exploit the user's continuous movement indoor to refine the location prediction set. Our approach can be applied to enhance other systems. Second, we use a unique concept to detect the indoor social contacts with NFC by tapping the back of the two phones together. Third, we propose a novel idea to combine this social contact and the user's continuous movements to identify the exact entries with confidence in the fingerprinting database that need updating for crowdsourcing. Finally, we share our thoughts on automating the crowdsourcing process without any user input.


Evaluation of Bluetooth properties for indoor localisation

Khuong An Nguyen, Zhiyuan Luo
Book ChapterProgress in Location-Based Services (2013) | Springer | Pages 127-149 | ISBN: 978-3-642-34202-8.


Current indoor localisation systems make use of common wireless signals such as Bluetooth, WiFi to track the users inside a building. Amongst those, Bluetooth has been widely known for its low-power consumption, small maintenance cost, as well as its wide-spread amongst the commodity devices. Understanding the properties of such wireless signal definitely aids the tracking system design. However, little research has been done to understand the properties of Bluetooth wireless signal amongst the current Bluetooth-based tracking systems.

In this chapter, the most important Bluetooth properties related to indoor localisation are experimentally investigated from a statistical perspective. A Bluetooth-based tracking system is proposed and evaluated with the location fingerprinting technique to incorporate the Bluetooth properties described in the chapter.

Reliable indoor location prediction using conformal prediction

Khuong An Nguyen, Zhiyuan Luo
Journal Article Annals of Mathematics and Artificial Intelligence (2013) | Volume 74, Issue 1.

(Impact factor: 0.78)


Indoor localisation is the state-of-the-art to identify and observe a moving human or an object inside a building. However, because of the harsh indoor conditions, current indoor localisation systems remain either too expensive or not accurate enough.

In this paper, we tackle the latter issue in a different direction, with a new conformal prediction algorithm to enhance the accuracy of the prediction. We handle the common indoor signal attenuation issue, which introduces errors into the training database, with a reliability measurement for our prediction. We show why our approach performs better than other solutions through empirical studies with two testbeds. To the best of our knowledge, we are the first to apply conformal prediction for the localisation purpose in general, and for the indoor localisation in particular.

Enhanced Conformal Predictors for indoor localisation based on fingerprinting method

Khuong An Nguyen, Zhiyuan Luo
Conference Paper 9th IFIP International Conference on Artificial Intelligence Applications & Innovations (AIAI 2013).


We proposed the first Conformal Prediction (CP) algorithm for indoor localisation with a classification approach. The algorithm can provide a region of predicted locations, and a reliability measurement for each prediction. However, one of the shortcomings of the former approach was the individual treatment of each dimension. In reality, the training database usually contains multiple signal readings at each location, which can be used to improve the prediction accuracy.

In this paper, we enhance our former CP with the Kullback-Leibler divergence, and propose two new classification CPs. The empirical studies show that our new CPs performed slightly better than the previous CP when the resolution and density of the training database are high. However, the new CPs performs much better than the old CP when the resolution and density are low.


Conformal Prediction for indoor localisation with fingerprinting method

Khuong An Nguyen, Zhiyuan Luo
Conference Paper 8th IFIP International Conference on Artificial Intelligence Applications & Innovations (AIAI 2012).


Indoor localisation is the state-of-the-art to identify and observe a moving human or object inside a building. Location Fingerprinting is a cost-effective software-based solution utilising the built-in wireless signal of the building to estimate the most probable position of a real-time signal data. In this paper, we apply the Conformal Prediction (CP) algorithm to further enhance the Fingerprinting method. We design a new nonconformity measure with the Weighted K-nearest neighbours (W-KNN) as the underlying algorithm. Empirical results show good performance of the CP algorithm.


A case study on the usability of NXT-G programming language

Khuong An Nguyen
Conference Paper 23rd Annual Workshop on Psychology of Programming (2011).


The release of the Lego Mindstorms kit has carried the flexibility and creativity of Lego into the world of robotics, whilst targeting a variety of children and adults audiences. To achieve this goal, a programming language called NXT-G was developed to provide everyone full control of the Lego Mindstorms kit, regardless of their programming experience.

In this project, the programming language ambition is tested through practical experiments. In a controlled experiment, twelve participants carry out four tasks using the NXT-G software and a Lego robot. Their performances are then analysed to confirm the stated claim.

Robot-based evaluation of Bluetooth fingerprinting

Khuong An Nguyen
Thesis M.Phil Thesis, University of Cambridge.


This thesis proposed and implemented a new affordable indoor tracking system. The Bluetooth signal was found to be very stable and is reliable for any indoor positioning system. The Fingerprinting method was employed to manipulate the Bluetooth signal at many positions in the office room. In addition, a robot was created to perform the complex and time-consuming data collection process.

Spam filtering with Naive Bayes classification and the Porter stemming algorithm

Khuong An Nguyen
Technical Report Natural Language Processing project, University of Cambridge.


Spam (junk-email) identification is a well-documented research area. A good spam filter is not only judged by its accuracy in identifying spam, but also by its performance.

This project aims to replicate a Naive Bayesian spam filter, as described in the "SpamCop: A Spam Classification & Organization Program" paper. The accuracy and performance of the filter are examined with the GenSpam corpus. In addition, the project investigates the actual effect of the Porter Stemming algorithm on such filter.


An EMV (Chip & PIN) survey

Khuong An Nguyen
Thesis B.Sc Thesis, University of London.


The thesis was one of the first to investigate the structure of the UK Chip & PIN debit and credit cards. Particularly, we looked into CAMs (Card Authentication Methods) implemented by different banks to understand their policies.

A Java-based reader was developed to exchange information with all UK Debit/Credit cards, some overseas cards were also tested. The software can simulate an ATM machine to perform off-line PIN verification.


The nature of my high impact, practical research, and being a young lecturer (am still coding, and collecting data myself) will benefit enthusiastic students who would like to apply their ideologies in the real world.

If you are interested in doing research on such topics, feel free to get in touch.

I have had the pleasure to supervise the following students.


- Julia Meister (2021 - present): top 10 students in BSc Computer Science cohort at RHUL; Best Research Paper award at the 12th WISTP and 11th COPA international conferences.

- Xu Feng (2021 - present): owning a patent in sensor design; 85% GPA at Zhejiang University (China).

- Ajibola Obayemi (2022 - present): Software Engineering Manager at BCMY Ltd.

- Robert Choudhury (2020 - present): Telecoms software engineer, Distinction in MSc in Information Security at RHUL
co-supervised with Prof. Zhiyuan Luo.

- Dr. Nery Riquelme-Granada (2018 - 2021): Best Research Paper award at the 9th DATA 2020 international conference.
co-supervised with Prof. Zhiyuan Luo.
Thesis: "Coreset-based Protocols for Machine Learning Classification".


- Alice Ashby (2021 - 2022)
Thesis: "A novel cough audio pre-processing and segmentation algorithm for COVID-19 detection" (awarded 95%).
("Chris Boyne Prize" for the Best Usability Project)

- Alex Powell (2021 - 2022)
Thesis: "Study of Machine Learning Models for the Purpose of Natural Language Classification" (awarded 85%).

- Diego Toledano (2019 - 2020)
Thesis: "Solving 2048 using AI".

- Marsela Gavrilova (2019 - 2020)
Thesis: "Regression algorithms for learning".


All of my M.Sc students graduated with a Distinction degree.

- Dr. Ruth Blackwell (2019)
is now a Teaching Fellow at Royal Holloway University, UK.
Thesis: "WiFi Fingerprinting: three machine learning algorithms, six distance metrics and the UJIIndoorLoc database".

- Baldeep Singh (2019)
is now a Machine Learning Engineer at Kingston University, UK.
Thesis: "Machine Learning Based Indoor Positioning".

- Daniel Tirabasso (2019)
is now a CEO at Cogmatic, UK.
Thesis: "Dimensionality Reduction for Wi-Fi Fingerprinting based Indoor Localisation Systems".

- Arif Syed (2019)
is now a Machine Learning Engineer at InforceHub, UK.
Thesis: "Conformal predictors for detecting harmful respiratory events".

- Bharat Sikka (2019)
is now a Data Scientist at State Bank of India.
Thesis: "Machine Learning Based IndoorPositioning Using Wi-Fi Fingerprinting".

- Bhavyasree Pulivarthi (2020)
Thesis: "Machine Learning based Indoor Positioning".

- Karan Shah (2020)
Thesis: "Study and Comparison of Machine Learning Algorithms for Wi-Fi Based Indoor Localization".