Ethics statement
This investigation was conducted according to the principles expressed in the Declaration of Helsinki. All participants provided informed written consent prior to participation with pre-approval obtained from the Institutional Review board of the Ulsan National Institute of Science and Technology (UNISTIRB-14-01-C).
Experimental stimuli
Two sets of video stimuli and one game stimulus were used in this experiment. The first set of video stimuli were used to measure basic emotional states for the development of a basic neurophysiological index assessment model, while the second set consisting of TV commercials was used to evaluate our model. The game stimulus was used to focus participants’ attention.
For the TV commercial evaluation index development, there were four types of video stimuli. First, a neutral video containing green-colored scenery and nature sounds was used as a baseline neutral stimulus. Second, two videos were presented to induce emotional responses and to develop a happiness index in each participant; they presented the 2002 World Cup Korean soccer game (H1) or a dozing and smiling baby (H2). For the analysis, we selected the more appropriate video among these two videos depending on participants’ verbal responses after viewing. Third, another pair of video clips was presented to induce emotional responses of surprise - these showed a new way to pour beer (S1) or magnetic sand (S2). Again, one was selected based on the participants’ responses. Fourth, a moving dot game was used to focus participants’ attention [29]. In this game, participants had to identify a dot moving at a faster speed than the other dots, which moved at a constant speed and direction. During the game, participants were asked to indicate the dot’s location by pressing one of four keyboard buttons (Q, W, A, or S) mapped onto the quadrants of the screen (Figure 1). To fully engage their attention in the game, participants were informed that they would receive an extra reward if they achieved the highest score.
The second set of video stimuli, used for the TV commercial evaluation, was selected from smartphone commercials released after 2013. Six commercials from different brands were selected; they all had similar running times (approximately 30 s). We excluded two commercials from the most widely used brands (in Korea) to reduce the influence of attachment to a particular brand. Accordingly, four commercials were used in this experiment (C1, C2, C3, and C4). Each commercial has different contents and delivers the messages in different ways. For instance, C1 emphasized the function of the rear button using the metaphor by showing many couples with the concealed present box behind men. Then, a message follows as, ‘We always hide what is precious behind us.’ C2 stressed the function for taking a picture and color variation with animated atmosphere in the kitchen in the morning. C3 was a typical informative commercial, delivering their design and functional aspect only without any affective scenes. C4 demonstrated their characteristics such as ‘oscillatory sound’ with the scene in the club. The volume of the music was frequently changed according to its feature. We collected each commercial’s data from a publically available database (http://www.tvcf.co.kr/).
Participants and experimental tools
Ten men and ten women (mean age 22.9 ± 1.41 years) with normal or corrected-to-normal vision and who did not report any neurological disorders participated in this study. During the experiment, scalp EEG signals were recorded using a wireless EEG headset (EPOC, Emotiv Inc., San Francisco, USA) from 14 electrodes located at AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4 (in accordance with the International 10/20 system). EEG signals were sampled at 128 Hz and referenced to the average of the common mode sense (CMS, active electrode) and driven right leg (DRL, passive electrode) electrodes. The voltage between electrodes and the reference was amplified by 60 to 100 dB of voltage gain. The impedance levels of all of the electrodes were kept below 5 kΩ. All visual stimuli in this experiment were presented on a 27-in. monitor (QH270-IPSM, Achieva Korea, Incheon, Korea) positioned at about 60 cm distance from the participants’ eyes. Auditory stimuli were presented via a speaker positioned in the center. During the moving dot game, each participant used the keyboard to perform the task (that is, button press). We also used a questionnaire for the cognitive assessment and to collect behavioral responses to the presented commercials. This questionnaire originated from previous research [30] but was modified for the purpose of this study. We examined each participant’s preference for the commercials, brand name, and their degree of purchase intention. The preference for commercials was assessed by ranking each commercial. The preference for brand name and the degree of purchase intention were assessed using the Likert 7-point scale (1 ~ 7). In addition, participants were asked to answer whether they had seen a particular commercial or not, if they had known the brand name or not, and if they had previously purchased products of a particular brand or not. Finally, they were asked to determine the ranking of commercials based on how well they remember. In the next day, participants determined the ranking of commercials again based on how well they recall them after 1 day. In this study, we selected the preference for commercials, the degree of short-term memory, recall rate, and the degree of purchase intention for further analysis. All data were analyzed using the MATLAB program (Mathworks, Natick, MA, USA).
Experimental procedure
Before the experiment, each participant was situated in a comfortable chair for a few minutes. The EEG apparatus was installed during this time, and a brief explanation about the experiment was given. The experimental procedure consisted of five sessions in total, with four sessions for the EEG recording and one session for the questionnaire (Figure 2). In the first session, the green scenery picture with the nature sounds was presented to the participants for the neutral state condition. In the second session, we used four emotion-inducing video stimuli (two for happiness and two for surprise) to elicit the corresponding emotions from the participants. They were asked to indicate their emotion immediately after viewing each video clip to confirm that the evoked emotion matched our expectation. Participants selected their experienced emotion among seven emotional states (Ekman’s six basic emotions [31] plus not being classified). The rate of emotional feeling for each stimulus was 95% for H1, 50% for H2, 55% for S1, and 65% for S2 (H1: happiness stimulus 1, H2: happiness stimulus 2, S1: surprise stimulus 1, and S2: surprise stimulus 2). From these, we selected H1 and S2 as our basic stimuli.
In the third session, participants played the moving dot game, which sought to focus their attention. Participants were motivated by the promise of extra rewards for achieving the highest accuracy and completing the game in the shortest amount of time. In the fourth session, each of four commercials was presented randomly and participants simply viewed the commercials. In the final session, participants completed the questionnaire to evaluate each commercial. They ranked their preferences across all commercials and were asked to write down the specific products advertised in the commercials from memory to evaluate the short-term memory index of each commercial. Each ranking was converted to 4, 3, 2, and 1 points as the first, second, third, and fourth order, respectively. One day after the experiment, participants ranked how much they remembered from each commercial to build up a recall rate index.
Feature extraction
EEG signals from 18 participants were further analyzed, as two participants’ data were contaminated by external noise. The EEG signals were band-pass filtered (0.5 to 100 Hz) and re-referenced to a common average reference to reduce potential shifts due to external artifacts. Each EEG data was split into quarter-second windows without overlap. EEG data in each window were analyzed at six frequency bands using short-time Fourier transform (STFT): delta (0.5 to 4 Hz), theta (4 to 8 Hz), alpha (8 to 12 Hz), low beta (12 to 20 Hz), high beta (20 to 30 Hz), and gamma (30 to 50 Hz). In each frequency band, the power spectral density from the spectrogram was averaged (frequency interval of 0.5 Hz). These steps extracted a total of 84 features from 14 channels and 6 frequency bands for each time window.
Feature selection
One-way analyses of variance (ANOVA) and cross-validation were used to identify the optimal feature set for each participant’s cognitive/affective state classification models. Initially, 84 features were analyzed with separate one-way ANOVAs. Neural features collected for each of happiness, surprise, and attention were compared with those collected from the neutral session. From the results of ANOVAs, the first optimal feature with the highest F-value was selected. Along with this first feature, the next feature was selected with which classification accuracy through a tenfold cross-validation was maximized. This subsequent feature selection procedure based on tenfold cross-validation was repeated until its accuracy began to decrease. This process was performed for each participant and each neurophysiological index. Accordingly, the sizes of the optimal feature set differed between participants and neurophysiological indices. Finally, we obtained three EEG feature sets for each participant: neutral-happiness, neutral-surprise, and neutral-attention.
Classification analysis
Upon finding the optimal feature set for each cognitive or emotional state, the state classifier model was trained to classify brain signals into one of two states: neutral vs. happiness, neutral vs. surprise, or neutral vs. attention. The Fisher’s linear discriminant classifier (FLDA) is a well-known classification method to determine an optimal hyper-plane to separate the data space according to the classes [32,33]. The FLDA aims to determine a projection vector, w, to maximize between-class scatter and to minimize within-class scatter. The Fisher criterion function, J(w),is then defined as follows:
$$ J(w)\kern0.5em =\kern0.5em \frac{w^T{S}_Bw}{w^T{S}_ww} $$
(1)
where:
$$ {S}_B\kern0.5em =\kern0.5em {\displaystyle \sum_c\left({\mu}_c\kern0.5em -\kern0.5em \overline{x}\right)}\kern0.5em {\left({\mu}_c\kern0.5em -\kern0.5em \overline{x}\right)}^T $$
(2)
is a between-class scatter matrix, and,
$$ {S}_w\kern0.5em =\kern0.5em {\displaystyle \sum_c{\displaystyle \sum_{i\in c}\left({x}_i\kern0.5em -\kern0.5em {\mu}_c\right)}}\kern0.5em {\left({x}_i\kern0.5em -\kern0.5em {\mu}_c\right)}^T $$
(3)
is a within-class scatter matrix. c denotes the class label and μ
c
does the mean of data from the class c. \( \overline{x} \) denotes the mean of the data from all the classes. The projection vector (w) is the vector that maximizes J(w).
The trained classifiers were used to estimate state variations during watching TV commercials. The EEG signals for each commercial were divided into a series of segments (window length: 0.25 s), and the features sets were selected for each segment. The classifiers were applied to the features, yielding a posterior probability of estimating a particular cognitive or emotional state. These posterior probabilities were used as neurophysiological indices for temporal patterns of happiness, surprise, and attention. The average of these indices were calculated and compared across the commercials. Finally, the commercials were subjected to an elapsed-time analysis in terms of the scene and auditory structures.