SEARCH WITHIN CONTENT
Citation Information : Acta Neurobiologiae Experimentalis. Volume 79, Issue 4, Pages 421-431, DOI: https://doi.org/10.21307/ane-2019-039
License : (CC-BY-4.0)
Received Date : 11-March-2019 / Accepted: 22-September-2019 / Published Online: 10-January-2020
In the pursuit to clarify the concept of “BCI illiteracy”, we investigated the possibilities of attaining basic binary (yes/no) communication via brain-computer interface (BCI). We tested four BCI paradigms: steady-state visual evoked potentials (SSVEP), tactile, visual, and auditory evoked potentials (P300). The proposed criterion for assessing for the possibility of communication are based on the number of correct choices obtained in a given BCI paradigm after a short calibration session, without prior training. In this study users answered 20 simple “yes/no” questions. Fourteen or more correct answers rejected the null hypothesis of random choices at
With substantial progress in electroencephalogram (EEG) recording and analysis technologies, brain-computer interfaces (BCI) are on the verge of becoming a useful tool in a range of applications from industry, medicine and rehabilitation to gaming and entertainment (Brunner et al., 2015). In this light, one of the major open questions, namely “Could anyone use a BCI?” becomes more and more important. This question was used as the title of a book chapter (Allison and Neuper, 2010), where the authors quoted a common belief that “about 20% of subjects are not proficient with a typical BCI system”. This widely quoted fact underlies the emergence of the term “BCI illiteracy” (Blankertz et al., 2010; Edlinger et al, 2015; Carabalona, 2017). However, the lack of definition and vagueness of this term does not justify its widespread application to estimates of the “rate of idiopathic BCI failures”. This rate is usually counted as the ratio of participants who “failed” to use or calibrate specific BCIs with different setups, parameters, stimuli rendering and algorithms used in their respective studies. However, the very notion of “fail” in this context is dramatically far from obvious.
Fail is usually understood as the case when a participant did not achieve given performance threshold in given time using a given BCI system operating in given paradigm. Let us take a closer look at the potential causes of failures.
It is indeed possible that the structure of a given brain is such that the neural responses evoked in a given BCI paradigm are not readable from the scalp – but the given paradigm should always be referred to in discussion. For example, a verified inability of a subject to generate a P300 response to visual stimuli might have been called “visual P300-BCI illiteracy” in contrast to “BCI illiteracy” which also suggests an inability to generate responses in other types of BCI.
Most BCIs do not work efficiently from the start, and require either a short calibration in P300 and SSVEP (steady-state visual evoked potential) paradigms or often longer training in MI-BCI (motor imagery-brain computer interface). While the lengths of these sessions are usually fixed or limited, it is possible that in some cases longer training or repeating the whole session on another day may turn a fail into successful communication.
The quality of EEG recordings, rendering of the stimuli and the efficiency of the signal processing algorithms used in different BCI systems vary dramatically with respect to the robustness to artifacts and the ability to decode answers from noisy EEG. However, it is hard to estimate the percent of failures that should be attributed to the researchers rather than participants.
There are several measures of performance and speed of a BCI. However, there is no general agreement in relation to which of them (and above which threshold) should be used as an indicator of achieving a successful communication.
Some researchers have referred to 70% accuracy as “sufficient for a comfortable BCI operation” (Kübler et al., 2004; Vidaurre and Blankertz, 2010; Shu et al., 2018). While over a decade ago, Müller-Putz et al. (2008) mentioned the obvious fact that accuracy alone is not a valid indicator. For example, 7 out of 10 correct binary (yes/no) choices can occur in approximately 1 in 6 cases when decided by a coin toss. Using such criterion, over 17% of users would be counted as “successfully communicating” if the classifier were to be replaced by a random number generator. These remarks, also raised in other studies (Müller-Putz et al., 2008; Billinger et al., 2012; Combrisson and Jerbi, 2015), were seldom taken into account in published works.
Most BCI studies, in line with the current industry trends and ethical considerations, are performed on healthy subjects. However, the first BCIs were intended primarily for neurological patients, and this application may still be viewed as the most important for reasons other than just business. The first target group consisted of patients with neurodegenerative diseases (such as amyotrophic lateral sclerosis, ALS, and multiple sclerosis, MS), leading to the loss of voluntary muscle control and, in effect, making impossible any form of communication. The second group was patients after severe brain injury (i.e. disorders of consciousness, DoC, and lock-in-syndrome, LiS). For the first group, BCI provides a chance for communication and rehabilitation (Beukelman et al., 2011), and for the latter group this technology could improve diagnosis and allow for establishing communication (Luauté et al., 2015). Recently, several studies of BCI performance in neurological patients showed promising results (Lule et al., 2013; Lugo et al., 2014; Sellers el al., 2014; Annen et al., 2018; Heilinger et. al., 2018). However, constructing BCI that works effectively with such patients remains a major challenge.
In this study we tested different BCI paradigms on a group of healthy users and to verify its applicability in a clinical setting – a single patient recovering from DoC. Apart from that, we propose and implement the basic and most important steps towards the assessment of “BCI illiteracy” by: giving each participant the opportunity to test several different BCIs in different paradigms and modalities, reporting all the achieved efficiency measures, counting a successful communication as rejection of the hypothesis of random choices, using state of the art algorithms and hardware for EEG recording and stimuli rendering.
The most popular and well-researched BCIs are based upon P300, steady-state visual evoked potentials (SSVEP) or motor imagery (MI). Responses can be evoked by stimuli delivered in visual, auditory or tactile modalities. Currently, the most effective BCIs are visual P300 and SSVEP, but these paradigms require complete visual attention, impeding the possibility of using vision at the same time. MI is free from these constraints, but its application is far more complicated than the former two; it usually requires extensive prior training for each subject and the achieved information transfer rates (ITRs) are lower than are achievable in the other paradigms (Nicolas-Alonso and Gomez-Gil, 2012). BCIs that do not involve vision can also be based on auditory or tactile stimuli.
We tested four BCIs. The first two were based on the most popular paradigms – visual P300 and SSVEP. As for SSVEP, we used only high frequencies, due to the possibility of inducing photoepileptic attack and annoyance related to low frequency flickers (Fisher et al., 2005). To compensate for a possible dependence on the stimulus modality, we also tested P300-BCI with auditory and vibrotactile stimuli. To facilitate the task and shorten the calibration time we chose the most basic binary BCI setups (yes/no). To stay as close as possible to the “real world” scenario, we used a compact and comfortable EEG headset with a built-in wireless amplifier and water-based electrodes and also included in the testing one representative of an important target group – deeply paralysed persons, who cannot communicate via traditional channels or assistive technologies. This group includes patients after severe brain injury, who awake from coma and evolve into DoC. One of the hallmarks of emergence from DoC is a functional “yes/no” communication. However, due to the deep motor impairments and problems with regaining language functions in these patients, these hallmarks may elude traditional behavioral assessments like (Giacino et al., 2004) – hence the crucial role of BCIs in DoC (Dovgialo et al., 2019).
EEG was recorded using an 8-channel wireless EEG headset by BrainTech Ltd. (Fig. 1), with water-based electrodes in positions corresponding to C3, Cz, C4, P3, Pz, P4, O1 and O2 from the 10–20 international system, with reference at M1 and ground at M2.
Electrode impedance was monitored online at 125 Hz and kept below 20 kΩ through the duration of procedures. The signal sampling rate was 500 Hz. The design and execution of experiments were based upon the BrainTech BCI framework. Electrode impedances, current potentials and partial classification results were monitored online.
Four BCI paradigms were used: Visual P300, Auditory P300, Vibrotactile P300, SSVEP in high frequencies.
Thirty healthy volunteers: 20 females (mean age 24) and 10 males (mean age 22) tested each of these paradigms. The order of BCI sessions was randomly chosen for each subject. Experiments were randomly split into two sessions for each volunteer. Sessions were performed at different days to avoid fatigue of the participants. Each session consisted of calibration and communication. Additionally, one patient (14 years old) from a model hospital for children with severe brain damage, The Alarm Clock Clinic (http://www.klinikabudzik.pl/en), tested the P300-BCIs. This patient, who was in a coma after brain injury, was diagnosed at the time of the experiment with emergence from the minimally conscious state (eMCS) (Giacino et al., 2002) by means of the Coma Recovery Scale-Revised (Giacino et al., 2004), and could communicate via head movements.
The study was approved by the Rectors Committee for Ethics of Research with Human Participants at the University of Warsaw, and informed consent was obtained from the volunteers’ and patients’ legal representatives.
The first three paradigms from the previous section are based upon the same kind of expected response to stimuli, delivered in different modalities – the P300 evoked potential.
In the classical P300 speller, dozens of options (letters or groups of letters) are highlighted in random sequences, and the participant is instructed to count the blinks of the letter of interest. Such conditions conform to the oddball paradigm, where presentations of sequences of repetitive stimuli are infrequently interrupted by a deviant stimulus. Infrequent occurrences of the letter of interest are the deviant stimuli, while all the other blinks are treated as the repetitive stimuli, which the subject ignores. However, in a binary “yes or no” communication there are only two stimuli to choose from, which does not produce the oddball conditions necessary to elicit P300. Therefore, we implemented a scheme (called “MMN presentation paradigm” (Jin et al., 2015)) where the two infrequent stimuli – targets corresponding to the possible choices – are embedded in a stream of a frequently repeated stimulus – a distractor of the third type, which the participant ignores. Targets occur with a probability of 1/6.
The words “yes” and “no” were written (in Polish) in black on a white rectangle. Changes of the text color from black to red correspond to the frequent stimulus (distractor), while the actual choices (the target stimulus) were made by counting the appearances of an image of the face of Albert Einstein (Kaufmann et al., 2013) on top of the desired option. The duration of these stimuli was set to 100 ms, with the inter-stimulus interval (ISI) randomized between 200 and 350 ms.
Three different tones: C4 (261.63 Hz), E4 (329.63 Hz) and G4 (392 Hz) played on different instruments were combined with sound spatialization (left, center, right). Left – the lowest tone (synthesized electric piano/synth) – was the rare stimuli (target) for the “yes” selection, center – middle tone (acoustic piano) – was the frequent stimuli (distractor) and right – the highest tone (harpsichord) – was the rare stimulus (target) for “no”. Amplitudes were normalized to perceived equality of the loudness levels. ISI was randomized between 400 and 700 ms, with sound duration 200 ms. Stimuli were presented using plug-in Bose Soundtrue earphones at a level comfortable for the subject. Additionally, the words “yes” and “no” were statically displayed on the screen as a hint for the lateralization during sessions.
Vibrotactile stimulators were taped to left and right hands and to the neck of the subject. The left-hand tactor served as the rare stimulus (target) for “yes”, neck stimuli served as the frequent neutral stimulus (distractor) and the rare stimulus (target) for “no” was on the right hand, as in (Brouwer and Van Erp, 2010). During sessions the words “yes” and “no” were also displayed on the screen. Stimulation duration was 200 ms with ISI randomized between 300 and 450 ms.
Steady state visually evoked potentials (SSVEPs) can be recorded in response to visual stimulation with specific frequency, within the commonly defined frequency ranges: low (below 12 Hz), medium (12 Hz-30 Hz) and high (above 30 Hz) (Herrmann, 2001). Flicker at low or medium frequencies can be annoying and can trigger an epileptic seizure in vulnerable subjects (Fisher et al., 2005). Stable generation of high frequency stimuli cannot be achieved –for an arbitrary frequency – on a computer screen, and requires hardware control by dedicated electronics (Durka et al., 2012). To conserve the flexibility of rendering messages in the flickering fields, we used the Blinker device from BrainTech Ltd. (https://braintech.pl/blinker/?lang=en). It consists of 320 highlightable fields, with flickering frequency of each field controlled separately by dedicated electronics. To highlight the “yes” and “no” rectangles, the Blinker was configured to create two 6.5 × 4.5 cm flickering fields, observed from about 100 cm.
Calibration was performed in short blocks. Sound and brief visual instructions were presented before every block. During the calibration sessions, participants were asked to concentrate on: the face appearing over one of the words (yes/no) in the visual P300, sound corresponding to the yes/no selection in the auditory P300, vibrotactile stimulation corresponding to the yes/no selection in the vibrotactile P300, and the field displaying “yes”, “no”, or the non-blinking field with instruction in SSVEP-BCI.
After the instruction, the stimulation cycle was started. It consisted of 3–4 target repeats and a pause. Calibration was stopped when 40 target epochs were collected.
The calibration procedure contains blocks of three instructions, asking participants to concentrate her/his attention on the word “yes”, word “no”, and the instruction text. The order was randomized. Before instructions, a 2 second pause allowed participants to rest after the previous stimulation. Texts of the instructions were shown on the top status field and played back as a voice-over. After each instruction, stimulation was started for 6 seconds: the fields with words “yes” and “no” started flickering with different frequencies. After every 2 blocks, the system validated the performance of the current frequency set and, if needed, selected a new set of frequencies to be calibrated. Focusing the participant’s attention on the instruction field allowed the system to collect statistics for the non-control state, when the stimulation was on but the participant was not focusing on any of the control words; this allowed for detection of non-control state. Calibration continued until the system found a set of frequencies which provided AUC≥0.8. AUC stands for the area under the ROC curve, used for the assessment of detection as in Dovgialo et al. (2019).
The communication session started directly after each calibration and consisted of answering 20 simple yes/no questions via BCI (e.g. “Is grass blue?”). It was assumed that the respondents knew the right answers. The proportion of expected yes/no responses was 1:1 and the order of questions was randomized for each session. Each session starts with text and audio instructions explaining the task. The subject was asked to listen to the question and focus on the target stimuli corresponding to the correct answer. After one second, the cyclic presentation of stimuli started and continued until the classifier returned the selected word. When a word was selected, the stimulation cycle stopped and visual feedback was presented: the background of the selected word was highlighted red (if the answer was incorrect) or green (if answer was correct).
Classification of visual P300 and SSVEP potentials relied on derivations O1, O2, Pz and Cz. Auditory and vibrotactile paradigms used C3, C4, Cz and Pz.
Raw signal sampled at 500 Hz was filtered online using 50 Hz notch filter and 2nd order Chebyshev type 2, 12 Hz lowpass filter. Epochs from -200 to 900 ms aligned to the stimuli were linearly detrended, baseline-corrected to the interval containing 200ms before the stimuli and decimated by a factor of 21, resulting in a new sampling rate 23.809 Hz. Channels of the decimated signal were then concatenated into one flat feature vector of length 92 (23 features per channel).
An LDA classifier (sklearn python library (Pedregosa et al., 2011), sklearn.discriminant analysis. Linear Discriminant Analysis) with automatic shrinkage parameter (adjusted as proposed by Ledoit and Wolf, 2003) was fitted to responses to 40 targets and 40 nontargets. Optimal decision thresholds for the classifier were calculated for each participant, based upon leave-one-out calibration and False Positive Rates.
The classifier issues a final decision based upon two conditions: either each of the stimuli was presented at least 10 times, or there were 3 consecutive decisions above the threshold for one choice and below threshold results for the other choice.
The SSVEP analysis is loosely based on Ajami et al. (2018). In the first step, the EEG signal was online filtered by cascade of IIR filters: 50, 100, 150, 200 Hz notch filters and Butterworth second order bandpass filter with edges 20 and 60 Hz. Filtered signal is stored in a circular buffer which returns epochs 1 second long – for classifier training each second and for feedback/communication each 0.5 seconds.
For each of the frequencies used in stimulation, a design matrix was created from 1 second long sines and cosines with the stimulation frequency fS, its first harmonic 2fS and subharmonic fS/2. Using this matrix and raw EEG signal, the LASSO model (implementation in sklearn library sklearn.linear model.coordinate descent.Lasso (Pedregosa et al., 2011), with parameters differing from default: alpha=1, max iter=1000, warm start=True, selection=cyclic, fit intercept=False) was trained for every returned buffer.
A set of candidate frequencies is chosen from the interval 15–45 Hz with step 1–2 Hz, chosen to exclude harmonics. Additionally, to each integer frequency a random number from [-0.1, 1 Hz] was added. Calibration for 2-field BCI finds the two highest frequencies, giving the largest contrast in SSVEP responses. The contrast was measured in terms of the LASSO weights. Additional threshold for these weights was set for detection of non-control state, using epochs recorded when the participant was asked to concentrate on non-blinking fields.
Every 0.5 second a 1-second signal buffer was used to compute LASSO weights. Frequency with the highest LASSO weight was selected as the action chosen in this buffer. When two consecutive buffers returned the same selected frequencies, and LASSO weights for both buffers were higher than the threshold, the choice was accepted, stimulation was stopped and the feedback for the chosen action was presented. Otherwise the system was in non-control state and stimulation was continued.
To assess the results of BCI communication, for each participant and paradigm we computed:
where Ti – time for answering each question, counted from the start of the stimulation (after end of question) until the presentation of feedback, N – number of questions (in this study). Presented in seconds (s).
where P – probability of the correct answer, N – number of BCI classes (in this study), T̄ – mean answering time from Equation (1) in minutes. Numerator derived from work by Shannon and Elwood (1948), denominator in minutes, hence the units of reported ITR are bits/min.
Assuming everyone knew the answer to trivial questions, the units of reported ACC are %
where Ncorr – number of correctly answered questions, N – total number of questions (in this study N=20), P-value of one-sided binomial test on the amount of correct yes/no answers, assuming zero hypothesis of answers being random as a coin toss (implementation from the SciPy Python library (Jones et al., 2001), which was used to define users that were able to communicate (P<0.05) and users who were not able (Müller-Putz et al., 2008).
Due to logistic failures related to the availability of participants on prearranged sessions, not all of the participants performed all the planned sessions. Below we summarize the communication efficiency measures, presented in full in Table II: 28 volunteers tested SSVEP BCI; 27 of them (96%) could communicate at P<0.05 and average response time was 3.01 s (mean accuracy 95%, mean ITR 23.31 bit/min), all the participants tested Visual P300 BCI; 24 (80%) were able to communicate at P<0.05 and average response time was 17.7 s (mean accuracy 89%, mean ITR 2.45 bit/min), 27 volunteers tested Auditory P300 BCI; 6 of them (22%) were able to communicate at P<0.05 and average response time was 39.11 s (mean accuracy 80%, mean ITR 0.47 bit/min), 24 volunteers tested Vibrotactile P300 BCI; 4 of them (17%) were able to communicate at P<0.05 and average response time was 27.75 s (mean accuracy 76%, mean ITR 0.51 bit/min).
The patient from The Alarm Clock Clinic (code 31) managed to communicate at P<0.05 with the vibrotactile and visual P300 BCIs (2 out of 3 paradigms, as SSVEP-BCI was not tested in this case).
Achieved accuracies are presented in Fig. 3 and Table I. Twenty-three participants tested all four BCI paradigms. All of them were able to use at least one BCI paradigm, six of them could use only one. Two different BCI paradigms were successful in the case of seventeen participants, and eight of them could not work with more than two. Nine participants were able to use at least three BCIs, and all four BCIs performed correctly in case of one participant.
Moreover, Table II presents mean accuracies (Eq. 3), mean answering times (Eq. 1) and mean ITRs (Eq. 2), calculated for healthy participants who communicated via BCI. Distributions of the parameters are shown in Fig. 3. Fig. 2 presents the cumulative percentage of participants reaching the given accuracy level.
Among the P300-BCIs, only the visual modality has been previously examined on a representative group of users. Results obtained by Edlinger et al. (2015) on 100 participants, who used different spellers, showed that about 1–3% of them could choose no letter out of five and 89% of the 81 participants reached accuracy between 80% and 100%.
As for SSVEP-based BCIs, Edlinger et al. (2015) showed that about 13% of users achieved a score below 60% of the correct responses and were not able to use 4-choice speller based on low-frequencies. A similar paradigm was used at the CeBIT fair by B. Allison et al. (2010), involving 106 participants who were asked to write a few example words with 5-choice speller based on low-frequency SSVEP-BCI. Mean accuracy was 95.78% and only a few participants reached a score below 65%. In a follow-up study, a high-frequency SSVEP-BCI was also tested on 84 participants (Volosyak et al., 2011). Mean accuracy in low frequencies was 92.26%, and 2.33% of participants failed. As for high frequencies, only 56 participants were able to control the system with an accuracy of 89.16%, yielding 34.88% of failures.
This handful of studies were explicitly dedicated to the assessment of “BCI illiteracy”. However, due to problems with the reporting of performance and an undefined notion of “sufficient performance”, discussed in the Introduction, meta-analysis of the results reported in these and other BCI-related studies is practically impossible.
To pave the way towards a statistically and neurophysiologically complete assessment of the notion of “BCI illiteracy”, we propose: solid criteria for counting “fails”, based upon rejection of statistical hypothesis of random choices (non-communication), giving each participant a chance to test BCI in several modalities and paradigms.
In the current study, BCIs based on SSVEP and visual, auditory and vibrotactile P300 were tested on a group of 30 participants. Of these, 80% could communicate at P<0.05 with visual P300-BCI, 30% with auditory P300-BCI, 17% with vibrotactile P300-BCI and 97% with SSVEP. One out of 30 participants managed to communicate with all four paradigms, nine with at least three, 18 with two or more. Finally, all participants achieved above-random communication in at least one paradigm. Moreover, P300 BCIs were tested on one neurological patient who communicated via two out of three of the tested paradigms.
It must also be clearly stated that none of the participants were given a second chance in this study. On the contrary, we noticed some suboptimal elements of the procedures, which could deteriorate the performance of first-time users. For example, detection of the SSVEP response began directly after the question, leaving no time to find the answer, so the first several hundreds of milliseconds usually contained no response. Another interesting effect was revealed by comparing the subjects’ performance during the calibration and following communication sessions. In the visual P300, users usually performed well during the calibration, while in the following communication session they probably grew tired causing the appearance of the alpha activity, deteriorating the detection of P300. On the contrary, in the auditory P300, results of the communication session were sometimes better than the calibration, and users reported that after only several repetitions they learned to efficiently distinguish the desired stimuli.
In the second part of the experiment we tested the possibility of communication via BCI in the clinical settings. We chose a patient recovering from DoC, a 14-year-old boy who before the accident was a brilliant student with no impairments. Owing to good rehabilitation progress he was regaining both motor and cognitive functions. At the time of the BCI test he was able to communicate using head movements, so he was diagnosed as eMCS. He tested three different BCI paradigms: visual, auditory and tactile. In two of them 14-year-old – visual and tactile – he achieved high accuracies: 90% and 85%, respectively. It suggests that in the case of these two BCIs, he could communicate via BCI successfully. This was probably due to the quick progress of rehabilitation and above-mentioned high cognitive functioning before the accident; also, during the procedures he was very conscientious, ambitious and motivated.
This confirmed the applicability of the tested system in clinical settings. Our result is also in line with other studies (Lule et al., 2013; Lugo et al., 2014; Sellers el al., 2014; Annen et al., 2018, Heilinger et. al., 2018), that showed the capacity for using BCI after severe brain injury with high accuracy for at least one patient in respective experimental groups. Nevertheless, BCI studies in DoC patients are significantly more complicated than healthy users tests, due to a multitude of ethical, technical and logistic issues (c.f. Dovgialo et al., 2019).
As discussed above, the concept of “BCI illiteracy” lacks a commonly accepted definition, as well as solid relevant research. The presented results indicate that each of the 30 participants, as well as the eMCS patient, were able to communicate via BCI, given the chance to test more than one paradigm. Apart from advocating a statistical approach to the notion of “successful communication” and the necessity of testing BCIs in different paradigms and modalities, we propose to summarize the current state of the art in a paraphrase of DeFinetti’s (1992) provocative statement: BCI illiteracy does not exist.
This research was partly supported by the NCBiR (POIR.01.01.01-00.0573/15-00) and Polish National Science Centre (UMO-2015/17/N/ST7/03784) grants, awarded to BrainTech Ltd. and the University of Warsaw, respectively.
We thank the “Akogo?” foundation of Ewa Błaszczyk and the staff of The Alarm Clock Clinic, as well as the parents of the patient for understanding the importance of the research and kind cooperation.
Percentage of participants who achieved at least the communication accuracy percentage marked on the horizontal axis in each of the BCI paradigms (A – Auditory P300, B – Vibrotactile P300, C – Visual P300, D – SSVEP). Light gray – accuracy level corresponding to
Percentage of participants who achieved at least the communication accuracy percentage marked on the horizontal axis in each of the BCI paradigms (A – Auditory P300, B – Vibrotactile P300, C – Visual P300, D – SSVEP). Light gray – accuracy level corresponding to