Introduction
Learning and recognizing human activities, e.g. Activities of daily living (ADLs), are not only very useful when building a pervasive home monitoring system but ADLs are also important indicators of both cognitive and physical well-being in healthy and ill humans. People can benefit a lot from ADLs recognition. For example, 1) allowing computing systems to proactively assist users with their tasks; 2) supporting more information from past activities for medical diagnosis; 3) assisting patients with chronic impairments, personal fitness training and rehabilitation, encouraging humans to adopt a healthy lifestyle; 4) preventing young children from danger areas (e.g. stove, balcony); 5) changing the game experience (e.g. the Microsoft Kinect).Recently, population aging becomes a global phenomenon with improvements in people's life expectancy and fewer children. The ageing population and changing structure of the population will bring both opportunities and challenges for the economy, services and society at national and local levels. Thus, society pays more attention to older healthcare. Besides, care for patients is still on the spot of people's attention. To provide better and timely care for both of them, researchers are leveraging different sensors, e.g. WiFi, UWB, inertial sensors, cameras and so on, to detect people's ADLs. It is the so-called human activity recognition (HAR).
So in this post, you will see:
1) What is the process of HAR, namely HAR chain.
2) How to recognize ADLs using machine learning algorithms and deep learning algorithms.
The experiment data and source code we use here are referring to here.
HAR chain
Generally, sensor-based HAR Chain includes:
- Data Collection
- Data Segmentation
- Feature Normalisation or Scaling
- Feature Extraction, Feature Selection (if necessary)
- Classifier Selection
- Evaluation
Next, I will show your the details of each step with our example.
Experiment setup
Twenty healthy adult subjects with asymptomatic shoulders and no prior shoulder surgery was recruited and provided informed consent for participation in this study. The subjects’ mean age was 28.9, range 19-56. There were 14 male and 6 female subjects. Fifteen subjects were right hand dominant, and five were left-hand dominant.
Under the supervision of an orthopedic surgeon, each subject performed 20 repetitions of seven shoulder exercises bilaterally. The sensor used here is an Apple Watch located on the wrist of the subjects' dominant hands. The exercises performed are elements of an evidence-based rehabilitation protocol for full-thickness atraumatic rotator cuff tears (Kuhn et al. 2013) and included:
Under the supervision of an orthopedic surgeon, each subject performed 20 repetitions of seven shoulder exercises bilaterally. The sensor used here is an Apple Watch located on the wrist of the subjects' dominant hands. The exercises performed are elements of an evidence-based rehabilitation protocol for full-thickness atraumatic rotator cuff tears (Kuhn et al. 2013) and included:
- pendulum (PEN)
- abduction (ABD)
- forward elevation (FEL)
- internal rotation (IR)
- external rotation (ER)
- trapezius extension (TRAP)
- upright row (ROW)
Data Collection
The 6-axis raw sensor data consists of total acceleration a = [ax, ay, az] and rotational velocity ω = [ωx, ωy, ωz], measured in the coordinate frame of the watch. No further preprocessing or filtering was applied to the raw data. The sensor data was acquired from the active extremity using an Apple Watch (Series 2 & 3) with the PowerSense app, sampling at fs = 50 Hz.
and has a shape (N, L/fs, 6), where L is the window length. An exercise label was attributed to a window from the ground truth annotation when the exercise was performed for the entirety of that window.
Though we could use more features as far as we can. It doesn't mean the positive linear relationship between the feature size and recognition accuracy. Therefore, we need to choose the most effective features for both efficiency and accuracy. Common methods we use include Principal component analysis (PCA) and Linear discriminant analysis (LDA).
The feature extraction and feature selection I introduced here is mainly for the machine learning (ML) algorithms. ML needs we manually extract features. However, with the occurrence of deep learning, many researchers are fond of using deep neural networks because they can automatically extract features which is more likely to include more information beneficial for recognition accuracy.
But keeping in mind, ML and Deep learning both have their strengths. It doesn't mean deep learning is suitable for every circumstance. for situations lack of enough samples, ML is more likely to achieve good accuracy.
Fig 1 activity: pendulum (PEN) |
Fig 2 activity: abduction (ABD) |
Fig 3 forward elevation (FEL) |
Fig 4 internal rotation (IR) |
Fig 5 external rotation (ER) |
Fig 6 trapezius extension (TRAP) |
Fig 7 upright row (ROW) |
Data Preprocessing: data segmentation
The raw sensor data was segmented using overlapping fixed-length sliding windows W for each of the six sensor signals. The 3D temporal signal tensor ϕ is produced from the set of windowsand has a shape (N, L/fs, 6), where L is the window length. An exercise label was attributed to a window from the ground truth annotation when the exercise was performed for the entirety of that window.
Fig 7 sketch for data segmentation |
Feature Normalisation or Scaling
Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.
The common-used normalization methods include Rescaling, Standardization, Scaling to unit length.
Rescaling: The simplest method is rescaling the range of features to scale the range in [0, 1] or [−1, 1]
Standardization: Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit variance.
Scaling to unit length: To scale the components of a feature vector such that the complete vector has
length one.
Feature Extraction and Feature Selection
A feature mapping F(W) comprised of typical HAR statistical and heuristic features was computed to define the feature space for the classifiers. An identical set univariate features: mean, variance (σ^2), standard deviation (σ), maximum, minimum, skewness, kurtosis, mean crossings (ζ), mean spectral energy (ξ), and a 4-bin histogram, were computed for each signal vector in each segment. Of course, there are also some other features like FFT amplitude and frequency, Zero Crossing Rate, Mean Crossing Rate, Mean of gradient and so on.Though we could use more features as far as we can. It doesn't mean the positive linear relationship between the feature size and recognition accuracy. Therefore, we need to choose the most effective features for both efficiency and accuracy. Common methods we use include Principal component analysis (PCA) and Linear discriminant analysis (LDA).
The feature extraction and feature selection I introduced here is mainly for the machine learning (ML) algorithms. ML needs we manually extract features. However, with the occurrence of deep learning, many researchers are fond of using deep neural networks because they can automatically extract features which is more likely to include more information beneficial for recognition accuracy.
But keeping in mind, ML and Deep learning both have their strengths. It doesn't mean deep learning is suitable for every circumstance. for situations lack of enough samples, ML is more likely to achieve good accuracy.
Classification
Common-used classifiers include (ML) Decision Tree, k-Nearest Neighbor (k-NN), Naive Bayes classifier, Support vector machine (SVM), random forest (RF) and (Deep Learning) DNN, RNN, LSTM, RNN+LSTM and so on. In this post, we will use both ML and deep learning algorithms to make a comparison.
With the limit scale, I am not going to talk deeply about these classifiers. You can refer to outside resources if you have interest.
Evaluation
Similarly, there are multi indicators for evaluation: accuracy, precision, recall and F-measure.
Since the mathematical principles of machine learning and deep learning are complicated. To avoid off the topic, I didn't look deeply at the explanation of these algorithms. However, if you want to do some innovation, I recommend you spare some time to seriously learn their principles.
Back to our topic, human activity recognition is a hotspot for recent researchs. If you share this interest with us, we can step into a deep ground.
- Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. One may think that, if we have high accuracy then our model is best. Yes, accuracy is a great measure but only when you have symmetric datasets where values of false positive and false negatives are almost the same. Therefore, you have to look at other parameters to evaluate the performance of your model.
- Precision - Precision is the ratio of correctly predicted positive observations of the total predicted positive observations.
- Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to all observations in actual class - yes.
- F1 score - F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. The F1 Score is the 2*((precision*recall)/(precision+recall)). It is also called the F Score or the F Measure. Put another way, the F1 score conveys the balance between the precision and the recall.
Algorithms Implementation
Need to know the code we use here is based on python. and source code please refer to seglearn
The most convenient way is to use pip install seglearn as shown below:
Fig 8 pip install seglearn |
- Support Vector Machine Classification
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import LinearSVC from seglearn.datasets import load_watch from seglearn.pipe import Pype from seglearn.transform import FeatureRep, PadTrunc # load the data data = load_watch() X = data['X'] y = data['y'] # create a feature representation pipeline with PadTruncsegmentation # the time series are between 20-40 seconds # thistruncates them all to the first 5 seconds (sampling rate is 50 Hz) pipe = Pype([('trunc', PadTrunc(width=250)), ('features', FeatureRep()), ('scaler ', StandardScaler()), ('svc', LinearSVC())]) # split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42) pipe.fit(X_train, y_train) score = pipe.score(X_test, y_test) print("N series in train: ", len(X_train)) print("N series in test: ", len(X_test)) print("N segments in train: ", pipe.N_train) print("N segments in test: ", pipe.N_test) print("Accuracy score: ", score)
- Random Forest Classification
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier from seglearn.datasets import load_watch from seglearn.pipe import Pype from seglearn.transform import FeatureRep, PadTrunc # load the data data = load_watch() X = data['X'] y = data['y'] # create a feature representation pipeline with PadTruncsegmentation # the time series are between 20-40 seconds # thistruncates them all to the first 5 seconds (sampling rate is 50 Hz) pipe = Pype([('trunc', PadTrunc(width=250)), ('features', FeatureRep()), ('scaler ', StandardScaler()), ('RF', RandomForestClassifier())]) # split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42) pipe.fit(X_train, y_train) score = pipe.score(X_test, y_test) print("N series in train: ", len(X_train)) print("N series in test: ", len(X_test)) print("N segments in train: ", pipe.N_train) print("N segments in test: ", pipe.N_test) print("Accuracy score: ", score)
- Nearest Neighbors Classification
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from seglearn.datasets import load_watch from seglearn.pipe import Pype from seglearn.transform import FeatureRep, PadTrunc # load the data data = load_watch() X = data['X'] y = data['y'] # create a feature representation pipeline with PadTruncsegmentation # the time series are between 20-40 seconds # thistruncates them all to the first 5 seconds (sampling rate is 50 Hz) pipe = Pype([('trunc', PadTrunc(width=250)), ('features', FeatureRep()), ('scaler ', StandardScaler()), ('KNN', KNeighborsClassifier(n_neighbors=7))]) # split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42) pipe.fit(X_train, y_train) score = pipe.score(X_test, y_test) print("N series in train: ", len(X_train)) print("N series in test: ", len(X_test)) print("N segments in train: ", pipe.N_train) print("N segments in test: ", pipe.N_test) print("Accuracy score: ", score)
- Convolution and RNN (LSTM) combination classification
from keras.layers import Dense, LSTM, Conv1D from keras.models importSequential from keras.wrappers .scikit_learn import KerasClassifier from sklearn.model_selection import train_test_split from seglearn.datasets import load_watch from seglearn.pipe import Pype from seglearn.transform import SegmentX def crnn_model(width=100, n_vars=6, n_classes=7, conv_kernel_size=5, conv_filters=10, lstm_units=10): input_shape = (width, n_vars) model =Sequential () model.add(Conv1D(filters=conv_filters, kernel_size=conv_kernel_size, padding='valid', activation='relu', input_shape=input_shape)) model.add(Conv1D(filters=conv_filters, kernel_size=conv_kernel_size, padding='valid', activation='relu')) model.add(LSTM(units=lstm_units,dropout =0.1, recurrent_dropout=0.1)) model.add(Dense(n_classes, activation="softmax")) model.compile(loss='categorical_crossentropy',optimizer ='adam', metrics=['accuracy']) return model # load the data data = load_watch() X = data['X'] y = data['y'] # create a segment learning pipeline width = 100 pipe = Pype([('seg', SegmentX(order='C')), ('crnn',KerasClassifier(build_fn=crnn_model,epochs =8, batch_size=256,verbose =0))])
# split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42) pipe.fit(X_train, y_train) score = pipe.score(X_test, y_test) print("N series in train: ", len(X_train)) print("N series in test: ", len(X_test)) print("N segments in train: ", pipe.N_train) print("N segments in test: ", pipe.N_test) print("Accuracy score: ", score)
- Evaluation
This section can be implemented by yourself. Remember here, we use F-score as the evaluation indicator.
Follow the code above and run them to see which algorithm achieves better accuracy!. And then to see what you can find.
Conclusion
In this post, I mainly introduced the HAR chain and how to use machine learning and deep learning algorithms to recognize 7 classes of activities using the dataset from the work: Shoulder Physiotherapy Exercise.Since the mathematical principles of machine learning and deep learning are complicated. To avoid off the topic, I didn't look deeply at the explanation of these algorithms. However, if you want to do some innovation, I recommend you spare some time to seriously learn their principles.
Back to our topic, human activity recognition is a hotspot for recent researchs. If you share this interest with us, we can step into a deep ground.
Reference
segLearn: https://dmbee.github.io/seglearn/install.htmlShoulder Physiotherapy Exercise Recognition: Machine Learning the Inertial Signals from a Smartwatch: https://arxiv.org/pdf/1802.01489.pdf
No comments:
Post a Comment