SEARCH WITHIN CONTENT
Citation Information : International Journal on Smart Sensing and Intelligent Systems. Volume 13, Issue 1, Pages 1-13, DOI: https://doi.org/10.21307/ijssis-2020-008
License : (BY-NC-ND-4.0)
Received Date : 30-March-2020 / Published Online: 25-May-2020
Many automation technologies using software are making humans convenient. One of these technologies is to collect data through cameras and sensors that are common in personal life and automatically recognize human and human activities. The goal of automation is to analyze the various types of big data that are difficult to perform mechanical data mining. Raw data collected from cameras and sensors are nothing but big data before analysis. In this case, how to protect data by secure storage is the most important issue. However, when the context-aware semantic information such as a specific person and his behavior is extracted from the analysis, the security sensitivity is increased. In other words, the secondary information generated by interpreting and extracting personal location and behavioral information contained in images and videos is linked to other personal information, causing privacy infringement issues. Privacy issues become important because there is a lot of software that everyone can access. Therefore, it is necessary to study privacy protection methods in the automatic recognition of human and human activities. This paper analyzes the cutting-edge research trends, techniques, and issues of privacy-preserving human and human activity recognition.
As more and more data is collected and the technology to process it develops, the importance of data is growing. In addition, technology is needed for sensitive data processing to protect privacy. All processes that process data, such as raw data, data being processed, and result data, require privacy. The general approaches to prevent privacy leakage adopted anonymity, access control, and transparency (Haris et al., 2014). With the introduction of machine learning (ML), big data processing is in full swing, but the task of privacy protection remains.
Machine learning technology has been actively introduced in big data processing, and applied in many applications where mechanical data mining is difficult. However, privacy concerns are raised in applications that extract information through deep learning (Tanuwidjaja et al., 2019). Privacy protection is essential as the application of deep learning is expanded from medical applications that process sensitive information such as patient diseases (Tanuwidjaja et al., 2019) to applications that analyze data collected by cameras and sensors to extract personal information (Wang et al., 2019).
There are several concerns that the machine learning approach can violate the user’s privacy; Figure 1 shows a machine learning process (Osia et al., 2018), and Figure 2 shows privacy issues during the machine learning process in Figure 1. Privacy may be violated when: (i) data holder shares a public dataset: anonymity of individuals are threatened; (ii) data holders participate in a model training procedure with their private data; (iii) a model provider shares a publicly learned model: the privacy of the individuals’ data used for training is at risk; (iv) an end user shares his/her data with the service provider: private information can be revealed to the service provider; (v) a service provider shares query answers with the end user: an attacker can infer the model itself by launching repeated queries.
In the field of vision-based machine learning, many studies have shown serious privacy concerns as described in Table 1.
On the other hand, collaborative machine learning and federated learning allow multiple participants, each with his/her own training dataset, to build a joint model by training locally and periodically exchanging model updates (Melis et al., 2018). The updates can leak unintended information about participants’ training data, and passive and active inference attacks can exploit this leakage as shown in Figure 3.
In addition, with big data processing, both population privacy and individual privacy become important (Cormode et al., 2012). Population privacy is violated by disclosing that some specific people are highly susceptible to a given genetic condition and individual privacy is violated by disclosing that a specific patient has that condition. In general applications, it is difficult to hide all the information in big data with cryptography because it is resource constraint and overly complex. Instead, many approaches have chosen to remove the sensitive parts of the information, while at the same time preserving the necessary information for further analysis (Osia et al., 2020).
Privacy protection is especially important in the field of dealing with human-related data. Machine learning has widely been applied to the recognition of human and human activity.
Human recognition is known as very useful in many application domains, for example, autonomous driving, post-disaster rescue, automated surveillance, military and robotics services (Gajjar et al., 2017). Human face recognition is another well-known application to search specific person in videos or in the list of images. Sensor data other than images or videos can be used to determine if there is no human being. However, videos or images are commonly used to track and recognize specific persons and human diseases. Videos and images are sensitive data because they directly demonstrate human characteristics. Even if the original big data such as images or videos are stored safely, the secondary information extracted by the deep learning processing also has the risk of leakage of private information. In the medical field, automated analysis technology is utilized for disease diagnosis and tracking. Personal disease information is also concerned about the invasion of privacy because the secondary semantic information is extracted through deep learning for diagnosis.
Recently, human activity recognition has been utilized in many applications such as smart homes, healthcare, and manufacturing. Human activity is mainly recognized by cameras or sensors (Chen et al., 2020). Cameras and sensors are embedded in not only many large electronic systems, such as vehicles, home appliances, and surveillance systems, but also many portable Internet of Things (IoT) devices and wearable devices, such as smartphones, watches, and fitbits. These devices are spreading for anyone to access, so they can collect personal-area data easily and that data can be used to recognize the human activity. In human activity recognition, sensor-based approaches have been used more than video-based approaches due to privacy concerns when placing cameras in human personal spaces. However, data about user behavior that is continuously measured and generated by user-friendly IoT devices (Iwasawa et al., 2017) allow adversary to infer private information about the user such as age, gender (Lu et al., 2013; Jain and Kanhangad, 2016), or possibly levels of health (Iwasawa et al., 2017; Chen et al., 2018).
Privacy issues become important because there is much software for automatic processing of big data, and this software is easily accessible to anyone. And, anyone can collect big data about human and human activities. The human can be me or my family. Therefore, it is necessary to study the privacy protection methods in the automatic recognition of human and human activities. This paper analyzes the cutting-edge research trends, techniques, and issues of privacy-preserving human and human activity recognition.
The rest of this paper is organized as follows: in the second section, the human recognition is analyzed in terms of its applications, its approaches, and its privacy vulnerability. In the third section, the human activity recognition is analyzed in terms of its applications, its approaches, and its privacy vulnerability. In the fourth section, privacy-preserving approaches to human and human activity recognition are discussed. In the fifth section, we conclude this paper and briefly discuss the possible future work directions.
Table 2 shows recent research on human recognition and related privacy issues. In the deep learning-based approaches, privacy issues exist.
Most human detection tasks are still based on visual images (Hwang et al., 2015). Many intelligent and complex video surveillance systems show a double-edged sword, high performance in detection, and privacy protection. When photos or images are recorded and processed by the surveillance system, the individuals and groups taken in the photos or images may be exposed unintentionally and analyzed differently from the system’s original purpose (Chattopadhyay and Boult, 2007; Ren et al., 2016; Wu et al., 2018) as shown in Figure 4.
Song and Shmatikov (2020) and Nelus and Martin (2019) introduced overlearning. That is, a model trained for a seemingly simple objective implicitly learns to recognize attributes and concepts sensitive from privacy. For example, a binary gender classifier of facial images also learns to recognize races – even races that are not represented in the training data – and identities.
Gajjar et al. (2017) detected and tracked human in video surveillance using histogram of oriented gradients (HOG) features. Nike and Malinowski (2010) track user’s activities by using GPS and other sensors on mobile devices.
Gomathisankaran et al. (2013) considered privacy leakage during medical image processing. Wang et al. (2014) detected the mental health, performance, and behavioral trends of the students by using sensing data from the smartphone. Ertin et al. (2011) tried to understand the psychological state of the user in real time by using the sensors to record physiological data.
Table 3 shows recent research on human activity recognition and related privacy issues.
Iwasawa et al. (2017) showed that deep neural networks (DNN) can reveals user-discriminative features unintentionally. DNN has the black-box property, it is hard to predict what DNN learns from training data. In other words, DNN can learn about the user information, the application gets to disclose the information unintentionally without the user’s consent. You et al. (2012) proposed Carsafe as an application that learns the driving behaviors of users by using the two cameras.
Chen et al. (2018), Hu et al. (2019), and Zhang et al. (2019) considered that the collected time series data are shared to infer the users’ physical activities as shown in Figure 5, the personal information can also be inferred from the same data that is used for activity recognition. This is because people show characteristics of activity according to personal facts like age, gender, and so on (Lu et al., 2013; Jain and Kanhangad, 2016).
Phan et al. (2016) collected health social network data and considered privacy preservation.
This section analyzes and summarizes privacy-preserving approaches in human and human activity recognition. Many studies utilize compound methods to protect human-related privacy. Recently, while utilizing deep learning in human and human activity recognition, the privacy-preserving approaches show the direction to address privacy leakage related to the characteristics of deep learning additionally. In Table 4, privacy issues and privacy-preserving approaches raised in recent studies are analyzed.
Garcia and Jacobs (2010) and Fontaine and Galand (2007) proposed complete data isolation using cryptography. Garcia and Jacobs (2010) and Gomathisankaran et al. (2013) adopted homomorphic encryption (HE). Sensitive data are processed encrypted and there is no information leakage in the process as shown in Figure 6 (El-Yahyaoui and Ech-Cherif El Kettani, 2019). There is an approach to encrypt sensitive areas in images or videos as shown in Figure 7 (Chattopadhyay and Boult, 2007).
On the other hand, anonymized videos are intentionally captured or processed to be in special low quality conditions that only allow for the recognition of some target events or activities (Butler et al., 2015; Dai et al., 2015; Ryoo et al., 2017; Ren et al., 2018) as shown in Figure 8. And, Winkler et al. (2014) introduced cartoon-like effects as shown in Figure 9. Speciale et al. (2019) protected confidential information about the captured 3D scene by lifting the map representation from a 3D point cloud to a 3D line cloud. In Figure 10, (a) shows that 3D point cloud reveals potentially confidential information in the scene. In contrast, (b) protects user privacy by concealing the scene geometry and preventing inversion attacks, while still enabling accurate and efficient localization.
And, Garcia Lopez et al. (2015) proposed edge computing for processing isolation; processing is performed near the data collected as shown in Figure 11. Central computing is a technique for collecting data in a central data center and performing intensive processing. In contrast, edge computing is a technology that processes data from a user’s device or near the point, where data are collected or where these are collected or generated. Because data are analyzed immediately at the edge, where data are collected and applied to the field, these are evaluated as a computing technology that can ensure immediate response and reliability rather than using a central data center such as the cloud (Xiao et al., 2019). That is, compared to central computing, edge computing supports a wide range of device mobility, has a low risk of data center hacking by distributed data processing, and has a short delay in data transmission and response for data processing. Central computing has an advantage in high-performance processing of big data, but edge computing is more efficient for applications that are sensitive to network failures or delays, such as autonomous vehicles, drones, or airplane engines.
Liu (2019) and Bun and Steinke (2016) provided a strong privacy guarantee by confusing a statistical query response drawn from a population-scale database by adding noise. They preserved that the presence or absence of a user in the database by differential privacy (Dwork, 2008) as shown in Figure 12 (Wood et al., 2018). By differential privacy, the general information for the entire population in a data set can be obtained without revealing individual information. In Figure 12, the difference between the analysis result for real-world data set and the analysis result for X’s opt-out data set is at most ε. This means that private information can be shielded at statistical database analysis. It is based on ε-differential privacy (Dwork, 2008). By injecting random noise into the released statistical results computed from the underlying sensitive data, such that the distribution of the noisy results is relatively insensitive to any change of a single record in the original data set.
Bian et al. (2020) proposed a combined approach for privacy-preserving image recognition using homomorphic encryption, secret sharing protocol, and homomorphic convolution.
As privacy protection approaches for deep learning, Iwasawa et al. (2017), Malekzadeh et al. (2018, 2019), Ajakan et al. (2015), Edwards and Storkey (2016), and Osia et al. (2020) proposed adversarial trainings to suppress the information disclosure. Ajakan et al. (2015) and Edwards and Storkey (2016) introduced an adversarial training framework, domain-adversarial neural network (DANN), and adversarial learned fair representations (ALFR) to remove sensitive information from representations each. Adversarial training increases robustness by augmenting training data with adversarial examples because machine learning models are often vulnerable to adversarial examples, maliciously perturbed inputs designed to mislead a model at test time (Tramer et al., 2018). Iwasawa et al. (2017) proposed an adversarial training framework with information sources categorized into multiple features rather than binary features; DANN and ALFR used binary features. Malekzadeh et al. (2018, 2019) also proposed to integrate an adversarial loss with the standard activity classification loss. But, an adversarial loss function can only be used for protecting one kind of private information, such as user identity and gender. Iwasawa et al. (2017), Malekzadeh et al. (2018), and Osia et al. (2020) require the labels of private information for adversarial trainings. In Figure 13, z and y are sensitive variables. z-predictor just uses f1, whereas y-remover uses both f1 and f2.
Zhang et al. (2019) adopted the image style transformation to protect all private information at once and maintain the desired information being inferred normally. The presented approach transforms raw sensor data into a new format that has a ‘style’ (sensitive information) of random noise and a ‘content’ (desired information) of the raw sensor data as shown in Figure 14. The pre-trained LossNet is used to define the loss functions that measure ‘style’ difference between transformed data and random noise and ‘content’ difference between transformed data and raw data.
Phan et al. (2016), Abadi et al. (2016), and Papernot et al. (2017) used differential privacy for training data privacy. Instead of applying differential privacy to the query process for large statistical data sets, they used differential privacy during machine learning. Phan et al. (2016) proposed a privacy preservation encoder, deep private auto-encoder (dPA), by developing an ε-differential privacy-preserving deep learning model. That is, they enforced ε-differential privacy by perturbing the objective functions of the traditional deep auto-encoder. Abadi et al. (2016) proposed differentially private deep models and Papernot et al. (2017) utilized differential privacy, which is not specific to the learning model.
Tramèr et al. (2016) and Wang and Gong (2018) considered the learning model privacy. An adversary can infer the model parameters by making many queries to the learning model as shown in Figure 15. f is the train model data owner has. An attacker uses q queries to extract . Wang and Gong (2018) introduced hyper-parameter stealing attacks applicable to a variety of popular machine learning algorithms such as ridge regression, logistic regression, support vector machine, and neural network. Juuti et al. (2019) proposed PRADA to protect against the model stealing attack. It analyzes the distribution of consecutive API queries and raises an alarm when this distribution deviates from benign behavior. Kariyappa and Kariyappa (2019) substantially degrade the accuracy of the attacker’s clone model by selectively sending incorrect predictions for attackers’ queries.
The approaches in Table 4 protect privacy by securing the entire data set and data processing using encryption, anonymity, and isolation. For large statistical data sets, differential privacy is applied to prevent individual personal information from leaking. When neural network is adopted in data processing, adversarial training is proposed against adversarial examples misleading the learning models. Improving the effectiveness of privacy protection using data format conversion was also taken into account. In addition, differential privacy was applied to data and learning models by inserting noise to prevent the disclosure of private information during machine learning. There were approaches to prevent inference or theft of learning models in machine learning.
On the other hand, a membership inference attack was mentioned against machine learning models (Shokri et al., 2017). At the membership inference, given a machine learning model and a record, an attacker can determine whether a record was used as part of the model’s training dataset or not. Shokri et al. (2017) showed privacy can be breached by membership inference in supervised machine learning. They suggested overfitting reduction by regularization, and trivial structure in machine learning models to mitigate the privacy breach. Differentially private models are robust to this attack, but the models reduce prediction accuracy for small ε values. Adversary training makes learning models be robust by prohibiting the models from being biased to produce a certain result by adversarial attacks. Song et al. (2019) showed that adversarially trained models are vulnerable to membership inference attacks. Moreover, an increased robustness of the adversarially trained model is correlated with an increase in the success of the membership inference attack due to adversarial generalization. Nasr et al. (2018) and Hayes and Ohrimenko (2018) design privacy mechanisms to reduce adversarial generalization. However, member inference attacks require in-depth research related to adversarial training.
If the approach that controls privacy is static, it is difficult to ensure satisfactory privacy preservation for dynamic context-aware applications. Large amounts of sensor data and context-aware applications create new types of ambiguous privacy issues that make it difficult for users to determine sensitive data (Haris et al., 2014). Therefore, privacy control should be adjusted to the situation, and a method to protect privacy should be developed by adapting to data, application domain, and data processing technology.
In this study, we focused on privacy-preserving human and human activity recognition. With the development of affordable, high-performance cameras and IoT devices equipped with various sensors, many applications are pouring out to provide convenience by analyzing various types of big data collected. Big data is difficult to find meaningful information until it is analyzed. However, the results analyzed by the development of computing technologies such as deep learning have new security problems. For human and human activity awareness, extracting information about a particular individual and his activities can cause problems that can infringe on privacy. Therefore, privacy-preserving approaches are important in human and human activity recognition. Because there is no single best solution for privacy protection, it should be studied in parallel with the expansion of deep learning applications. In this paper, privacy-preserving approaches and related issues were investigated in cutting-edge research.