Showing posts with label Tutorial. Show all posts
Showing posts with label Tutorial. Show all posts

Wednesday, 13 March 2019

【toturial】import function of python

There are many ways to import modules or packages.
  1. regular imports
  2. from __ import __
  3. relative imports
  4. optional imports
  5. local imports

regular imports

import single module or package
import sys
import sys as system

import multi-modules or packages
import os, sys, time

import sub module or package
import urllib.error

from __ import __

from is used when you only want to import one part of a module and package
from functools import lru_cache
from os import path, walk, unlink
from os import uname, remove

from os import (path, walk, unlink, uname, 
                remove, rename)
from os import path, walk, unlink, uname, \
                remove, rename
from is used when you want to import all of a module and package
from os import *

relative imports

Case 1: 
Given a file structure like this
we want to import subpackage 1 and subpackage 2 in the top __init__.py
from . import subpackage1
from . import subpackage2

But it will occur a problem like 
The reason is that this subpackage is not initialized. 
The solution to this problem is to add two commands before using "from . import subpackage1"
import subpackage1
import subpackage2

from . import subpackage1
from . import subpackage2
Then in both subpackage1 and subpackage2, there generates a file directory named __pycache__ for each other. 
Once generating this file, next time you can directly use 
from . import subpackage1
from . import subpackage2

You may notice the file in the directory __pycache__ is __init__.... 
what does that mean?
let's go on and look back later
If we want to import module_x and module_y in __init__.py under subpackage1 
from . import module_x
from . import module_y
the same problem will occur
Then we use the method described above, as
import module_x
import module_y

from . import module_x
from . import module_y

Then you will find
__init__.cpython-37.pyc
module_x.cpython-37.pyc
module_y.cpython-37.pyc

So that looks like a registration, once appearing in this directory, then you can use them.

Case 2: 
If we want to use my_package as well as its packages and modules inside.
For example, we create a testpackage, test.py
So the way of import my_package is:
import sys
sys.path.append('C:/Users/acw393/Dropbox/SecondYearResearch')
import my_package

from my_package import module_a
Attention, the path should be the upper directory of  my_package

This case is very useful when you use local modules!!

Here is an example to use local modules! (attention: this case is that target module and current runfile are not in the same file directory)
the location of the target module
__init__.py is the initial function of the whole module. if you import seglearn, the __init__ will automatically run.

But I want to use this module named seglearn in a file,

What we should do is to:
1) add a sys path of current target module into system environment:
__init__.py :
import sys
sys.path.append('C:/Users/Bang/Dropbox/SecondYearResearch/Seglearn-revised')
import seglearn

2) add the same thing at any runfile where you directly use seglearn.
import sys
sys.path.append('C:/Users/Bang/Dropbox/SecondYearResearch/Seglearn-revised')
import seglearn

Here, we can know, if we create a module (which should have __init__.py to initially run all sub-modules), we only need to add following code in the  __init__.py of that module.
import sys
sys.path.append(path): path means the file directory where the defined module is. 
import name_definedmodule

then add the same code in the file you want to import.

Optional imports

It is used when you wish to use a certain module by priority or use a backup when the target module doesn't exist. 
try:
    # For Python 3
    from http.client import responses
except ImportError:  # For Python 2.5-2.7
    try:
        from httplib import responses  # NOQA
    except ImportError:  # For Python 2.4
        from BaseHTTPServer import BaseHTTPRequestHandler as _BHRH
        responses = dict([(k, v[0]) for k, v in _BHRH.responses.items()])


try:
    from urlparse import urljoin
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.parse import urljoin
    from urllib.request import urlopen


Local imports

import sys  # global scope

def square_root(a):
    # This import is into the square_root functions local scope
    import math
    return math.sqrt(a)

def my_pow(base_num, power):
    return math.pow(base_num, power)

if __name__ == '__main__':
    print(square_root(49))
    print(my_pow(2, 3))


circular imports

# a.py
import b

def a_test():
    print("in a_test")
    b.b_test()

a_test()


import a

def b_test():
    print('In test_b"')
    a.a_test()

b_test()


Shadowed imports

import math

def square_root(number):
    return math.sqrt(number)

square_root(72)


Tips for establishing your own module and package
1. import all you need in __init__.py. This is a good habit for a programmer.

Tuesday, 12 March 2019

[Research] An introduction of Conda, Git, Pip

1. Introduction

Anaconda is a release version of python, of which features are supporting Linux, Mac, Windows and providing environment and package management ability. It is very convenient to switch between multi-version python. In Anaconda, Conda provides Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Conda is an open source package management system and environmental management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

Conda as a package manager helps you find and install packages. If you need a package that requires a different version of Python, you do not need to switch to a different environment manager, because conda is also an environment manager. With just a few commands, you can set up a totally separate environment to run that different version of Python, while continuing to run your usual version of Python in your normal environment.

Apart from conda, pip is the package installer for Python. You can use pip to install packages from the Python Package Index and other indexes.

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

So in this post, I will introduce the difference between those tools.

2. Anaconda

Figure 1. Anaconda installation directory
The general way to install the Anaconda is to download the software on its website. Once choosing the python version and x86/x64, you can directly download and install in your computer. Figure 1 shows the installation directory on my computer.

We see in the main directory, there includes [env] and [python]. The reason why I mention these two is it can show the main feature of the Anaconda in environment management. [python] in this directory means a chosen version python and spyder are already installed. So if you directly choose Spyder from the start. you are using the pre-defined type of python. In this default python, there are many packages already installed there. You can directly import and use it.
Figure 2. package already installed in default.

We can view this default as the base(root) environment. Then I will introduce the environment management of the tool conda.
First I will show you how to create a new environment as shown above:
1) open Anaconda Prompt
2) $conda create --name testpy python=3.6
3) $conda activate testpy
4) $conda install spyder
5) $spyder

Now this testpy has been created under [env] and spyder is also installed. Comparing figure 1 and 3, you will find them almost the same. But they work independently. The packages they use are also independent. That means the packages used by default spyder are not used by testpy. Testpy need to install by itself.
Figure 3. testpy installation directory
Figure 4. package installed for testpy.

3. pip

Generally, conda is mainly used for environment management. Of course, you can also use it to install or uninstall some packages. But here, I want to introduce the tool, pip.

pip is used for install or uninstall packages for python (spyder).
The format of language is: pip install package_name; pip uninstall package_name
attention: there are two ways to directly use pip commands. 1) directly use the command windows [Home + R]; 2) the second is to use Anaconda Prompt and #(base) model
The location of downloading packages is:
Figure 5. Location of installed package
As mentioned before, we created another environment testpy, and we intalled spyder for this environment. we also can use pip to install packages for this environment. The installed location is shown below.
Figure 6. Location of installed package for testpy
In this environment, if we want to use pip install, we need to enter Anaconda Prompt and $(testpy) model. 

conda is also used for install and uninstall packages, but it needs to use Anaconda Prompt.
For default python (spyder), we use $(base) model, for a specific environment like testpy, we use $(testpy).

the commands like:
(testpy) C:\Users\acw393>conda install -n testpy filterpy
(base) C:\Users\acw393>conda install numpy


4. Git











Monday, 18 February 2019

[Tutorial] ADLs recognition using meachine learning and deep learning: taking Shoulder Physiotherapy Exercise Recognition as an example

Introduction

Learning and recognizing human activities, e.g. Activities of daily living (ADLs), are not only very useful when building a pervasive home monitoring system but ADLs are also important indicators of both cognitive and physical well-being in healthy and ill humans. People can benefit a lot from ADLs recognition. For example, 1) allowing computing systems to proactively assist users with their tasks; 2) supporting more information from past activities for medical diagnosis; 3) assisting patients with chronic impairments, personal fitness training and rehabilitation, encouraging humans to adopt a healthy lifestyle; 4) preventing young children from danger areas (e.g. stove, balcony); 5) changing the game experience (e.g. the Microsoft Kinect).

Recently, population aging becomes a global phenomenon with improvements in people's life expectancy and fewer children. The ageing population and changing structure of the population will bring both opportunities and challenges for the economy, services and society at national and local levels. Thus, society pays more attention to older healthcare. Besides, care for patients is still on the spot of people's attention. To provide better and timely care for both of them, researchers are leveraging different sensors, e.g. WiFi, UWB, inertial sensors, cameras and so on, to detect people's ADLs. It is the so-called human activity recognition (HAR).

So in this post, you will see:
1) What is the process of HAR, namely HAR chain.
2) How to recognize ADLs using machine learning algorithms and deep learning algorithms.
The experiment data and source code we use here are referring to here.

HAR chain

Generally, sensor-based HAR Chain includes:
  1. Data Collection
  2. Data Segmentation
  3. Feature Normalisation or Scaling 
  4. Feature Extraction, Feature Selection (if necessary)
  5. Classifier Selection
  6. Evaluation
Next, I will show your the details of each step with our example.

Experiment setup

Twenty healthy adult subjects with asymptomatic shoulders and no prior shoulder surgery was recruited and provided informed consent for participation in this study. The subjects’ mean age was 28.9, range 19-56. There were 14 male and 6 female subjects. Fifteen subjects were right hand dominant, and five were left-hand dominant.

Under the supervision of an orthopedic surgeon, each subject performed 20 repetitions of seven shoulder exercises bilaterally. The sensor used here is an Apple Watch located on the wrist of the subjects' dominant hands. The exercises performed are elements of an evidence-based rehabilitation protocol for full-thickness atraumatic rotator cuff tears (Kuhn et al. 2013) and included:
  1. pendulum (PEN)
  2. abduction (ABD)
  3. forward elevation (FEL)
  4. internal rotation (IR)
  5. external rotation (ER)
  6. trapezius extension (TRAP)
  7. upright row (ROW)

Data Collection

The 6-axis raw sensor data consists of total acceleration a = [ax, ay, az] and rotational velocity ω = [ωx, ωy, ωz], measured in the coordinate frame of the watch. No further preprocessing or filtering was applied to the raw data. The sensor data was acquired from the active extremity using an Apple Watch (Series 2 & 3) with the PowerSense app, sampling at fs = 50 Hz.
Fig 1 activity: pendulum (PEN)
Fig 2 activity: abduction (ABD)
Fig 3 forward elevation (FEL)
Fig 4 internal rotation (IR)
Fig 5 external rotation (ER)
Fig 6 trapezius extension (TRAP)
Fig 7 upright row (ROW)

Data Preprocessing: data segmentation

The raw sensor data was segmented using overlapping fixed-length sliding windows W for each of the six sensor signals. The 3D temporal signal tensor Ï• is produced from the set of windows
and has a shape (N, L/fs, 6), where L is the window length. An exercise label was attributed to a window from the ground truth annotation when the exercise was performed for the entirety of that window.
Fig 7 sketch for data segmentation 


Feature Normalisation or Scaling

Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.

The common-used normalization methods include Rescaling, Standardization, Scaling to unit length.
Rescaling: The simplest method is rescaling the range of features to scale the range in [0, 1] or [−1, 1]
Standardization: Feature standardization makes the values of each feature in the data have zero-mean (when subtracting the mean in the numerator) and unit variance.
Scaling to unit length: To scale the components of a feature vector such that the complete vector has
length one. 

Feature Extraction and Feature Selection

A feature mapping F(W) comprised of typical HAR statistical and heuristic features was computed to define the feature space for the classifiers. An identical set univariate features: mean, variance (σ^2), standard deviation (σ), maximum, minimum, skewness, kurtosis, mean crossings (ζ), mean spectral energy (ξ), and a 4-bin histogram, were computed for each signal vector in each segment. Of course, there are also some other features like FFT amplitude and frequency, Zero Crossing Rate, Mean Crossing Rate, Mean of gradient and so on.

Though we could use more features as far as we can. It doesn't mean the positive linear relationship between the feature size and recognition accuracy. Therefore, we need to choose the most effective features for both efficiency and accuracy. Common methods we use include Principal component analysis (PCA) and Linear discriminant analysis (LDA).

The feature extraction and feature selection I introduced here is mainly for the machine learning (ML) algorithms. ML needs we manually extract features. However, with the occurrence of deep learning, many researchers are fond of using deep neural networks because they can automatically extract features which is more likely to include more information beneficial for recognition accuracy.
But keeping in mind, ML and Deep learning both have their strengths. It doesn't mean deep learning is suitable for every circumstance. for situations lack of enough samples, ML is more likely to achieve good accuracy.

Classification  

Common-used classifiers include (ML) Decision Tree, k-Nearest Neighbor (k-NN), Naive Bayes classifier, Support vector machine (SVM), random forest (RF) and (Deep Learning) DNN, RNN, LSTM, RNN+LSTM and so on. In this post, we will use both ML and deep learning algorithms to make a comparison.
With the limit scale, I am not going to talk deeply about these classifiers. You can refer to outside resources if you have interest.

Evaluation

Similarly, there are multi indicators for evaluation: accuracy, precision, recall and F-measure.

  • Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. One may think that, if we have high accuracy then our model is best. Yes, accuracy is a great measure but only when you have symmetric datasets where values of false positive and false negatives are almost the same. Therefore, you have to look at other parameters to evaluate the performance of your model. 
  • Precision - Precision is the ratio of correctly predicted positive observations of the total predicted positive observations. 
  • Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to all observations in actual class - yes.
  • F1 score - F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. The F1 Score is the 2*((precision*recall)/(precision+recall)). It is also called the F Score or the F Measure. Put another way, the F1 score conveys the balance between the precision and the recall.

Algorithms Implementation

Need to know the code we use here is based on python. and source code please refer to seglearn

The most convenient way is to use pip install seglearn as shown below:
Fig 8 pip install seglearn

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

from seglearn.datasets import load_watch
from seglearn.pipe import Pype
from seglearn.transform import FeatureRep, PadTrunc

# load the data
data = load_watch()
X = data['X']
y = data['y']

# create a feature representation pipeline with PadTrunc segmentation
# the time series are between 20-40 seconds
# this truncates them all to the first 5 seconds (sampling rate is 50 Hz)

pipe = Pype([('trunc', PadTrunc(width=250)),
             ('features', FeatureRep()),
             ('scaler', StandardScaler()),
             ('svc', LinearSVC())])

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True,
                                                    random_state=42)

pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

print("N series in train: ", len(X_train))
print("N series in test: ", len(X_test))
print("N segments in train: ", pipe.N_train)
print("N segments in test: ", pipe.N_test)
print("Accuracy score: ", score)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

from seglearn.datasets import load_watch
from seglearn.pipe import Pype
from seglearn.transform import FeatureRep, PadTrunc

# load the data
data = load_watch()
X = data['X']
y = data['y']

# create a feature representation pipeline with PadTrunc segmentation
# the time series are between 20-40 seconds
# this truncates them all to the first 5 seconds (sampling rate is 50 Hz)

pipe = Pype([('trunc', PadTrunc(width=250)),
             ('features', FeatureRep()),
             ('scaler', StandardScaler()),
             ('RF', RandomForestClassifier())])

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True,
                                                    random_state=42)

pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

print("N series in train: ", len(X_train))
print("N series in test: ", len(X_test))
print("N segments in train: ", pipe.N_train)
print("N segments in test: ", pipe.N_test)
print("Accuracy score: ", score)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

from seglearn.datasets import load_watch
from seglearn.pipe import Pype
from seglearn.transform import FeatureRep, PadTrunc

# load the data
data = load_watch()
X = data['X']
y = data['y']

# create a feature representation pipeline with PadTrunc segmentation
# the time series are between 20-40 seconds
# this truncates them all to the first 5 seconds (sampling rate is 50 Hz)

pipe = Pype([('trunc', PadTrunc(width=250)),
             ('features', FeatureRep()),
             ('scaler', StandardScaler()),
             ('KNN', KNeighborsClassifier(n_neighbors=7))])

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True,
                                                    random_state=42)

pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

print("N series in train: ", len(X_train))
print("N series in test: ", len(X_test))
print("N segments in train: ", pipe.N_train)
print("N segments in test: ", pipe.N_test)
print("Accuracy score: ", score)
  •  Convolution and RNN (LSTM) combination classification
from keras.layers import Dense, LSTM, Conv1D
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import train_test_split

from seglearn.datasets import load_watch
from seglearn.pipe import Pype
from seglearn.transform import SegmentX


def crnn_model(width=100, n_vars=6, n_classes=7, conv_kernel_size=5,
               conv_filters=10, lstm_units=10):
    input_shape = (width, n_vars)
    model = Sequential()
    model.add(Conv1D(filters=conv_filters, kernel_size=conv_kernel_size,
                     padding='valid', activation='relu', input_shape=input_shape))
    model.add(Conv1D(filters=conv_filters, kernel_size=conv_kernel_size,
                     padding='valid', activation='relu'))
    model.add(LSTM(units=lstm_units, dropout=0.1, recurrent_dropout=0.1))
    model.add(Dense(n_classes, activation="softmax"))

    model.compile(loss='categorical_crossentropy', optimizer='adam',
                  metrics=['accuracy'])

    return model


# load the data
data = load_watch()
X = data['X']
y = data['y']

# create a segment learning pipeline
width = 100

pipe = Pype([('seg', SegmentX(order='C')),
           ('crnn',KerasClassifier(build_fn=crnn_model,epochs=8, batch_size=256, verbose=0))])
# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

pipe.fit(X_train, y_train)
score = pipe.score(X_test, y_test)

print("N series in train: ", len(X_train))
print("N series in test: ", len(X_test))
print("N segments in train: ", pipe.N_train)
print("N segments in test: ", pipe.N_test)
print("Accuracy score: ", score)

  • Evaluation
This section can be implemented by yourself. Remember here, we use F-score as the evaluation indicator. 
Follow the code above and run them to see which algorithm achieves better accuracy!. And then to see what you can find.

Conclusion

In this post, I mainly introduced the HAR chain and how to use machine learning and deep learning algorithms to recognize 7 classes of activities using the dataset from the work: Shoulder Physiotherapy Exercise.

Since the mathematical principles of machine learning and deep learning are complicated. To avoid off the topic, I didn't look deeply at the explanation of these algorithms. However, if you want to do some innovation, I recommend you spare some time to seriously learn their principles.

Back to our topic, human activity recognition is a hotspot for recent researchs. If you share this interest with us, we can step into a deep ground.

Reference

segLearn: https://dmbee.github.io/seglearn/install.html
Shoulder Physiotherapy Exercise Recognition: Machine Learning the Inertial Signals from a Smartwatch: https://arxiv.org/pdf/1802.01489.pdf

[Research] Recurrent Neural Network (RNN)

1. Introduction As we all know, forward neural networks (FNN) have no connection between neurons in the same layer or in cross layers. Th...