Contrastive Representation Learning: A Framework and Review

Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain. However, the origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields and domains including Metric Learning and natural language processing. In this paper, we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods. We also provide a taxonomy for each of the components of contrastive learning in order to summarise it and distinguish it from other forms of machine learning. We then discuss the inductive biases which are present in any contrastive learning system and we analyse our framework under different views from various sub-fields of Machine Learning. Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented. Finally, we discuss the challenges and some of the most promising future research directions ahead.


Introduction
Lifelogging is the automatic gathering of digital records or logs about the activities, whereabouts and interactions of an ordinary person doing ordinary things as part of her/his ordinary day.Those records are gathered by the person, for the exclusive use of the person and not generally shared.Lifelogs are a personal record which can be analysed either directly by the person collecting the data, or by others [16], in order to gain insights into long term behaviour and trends for wellness or behaviour change, as well as to support searching or browsing for specific information for the past.
Lifelogging as a practical activity has been around for many years and has matured as the technology to ambiently and passively capture daily activities has evolved [3].Technologies for capturing lifelog data are wide ranging and well-documented and can be broadly classified into wearable or on-body devices such as wearable cameras, location trackers or physiological trackers for heart rate, respiration, etc. and off-body logging where sensors may form part of our environment like passive IR sensors for presence detection and contact sensors on doors and windows in the home.Off body logging would also include using software such as measures for cumulative screentime viewing, productivity at work or online media consumption.Whichever lifelog technolog(ies) may be used by a person, it is when these are combined and fused together that we get the best insights into the person as it is well accepted that so many aspects of our lives interact with, and depend on each other.
In a recent article by Meyer et al. [13] the authors highlighted several current issues for longer term self-tracking.Some of these are technical issues including incompleteness of data leading to data gaps, implicit tracking with secondary sources such as social networks and online services, and multiple interpretations of our data, beyond behaviour support.There are also issues of self-tracking for secondary users such as children or people with special needs with consequent ethical, legal, and social implications [6].
If we regard a lifelog collection as a multimedia or a multimodal artifact then it can take advantage of progress made in many other areas of multimedia analysis such as computer vision techniques to analyse images from wearable cameras [18].Progress in areas like computer vision have depended upon the easy availability of large datasets on which new ideas can be evaluated and compared to previous work and initiatives like ImageNet have helped to catalyse these developments.Yet when it comes to the general availability of lifelog collections, these are much rarer precisely because the data is personal.In related research areas which use these same technologies like using wearable sensors for sleep or gait analysis in clinical settings [7] or using wearable cameras for measuring exposure to different food types [15] then lifelog data collections do exist but in these cases the wearers are anonymised.In lifelogging it is the accumulation of data drawn together from across different sources and then fused together, that makes the lifelog and that does not reconcile well with the idea of anonymisation.
In this paper we provide a brief review of past work on using keystroke information for user authentication, for identifying different stages of writing strategies, for measuring stress and emotion.We advocate for greater use of keystroke dynamics in lifelogging and we describe a dataset of longitudinal keystroke and sleep data gathered from 4 participants over a period of more than 6 months.We describe an analysis of this dataset examining its daily consistency over time both within and across participants and we address the anonymisation of this data by releasing it in aggregated form which allows within-participant and cross-participant comparisons.The paper is organised as follows: in the next section we provide an overview of keystroke dynamics and then we describe the dataset we collected.We then provide an analysis of this data showing its consistency over time and its comparison to sleep score data.Finally, in our concluding section we summarise the case for greater use of keystroke dynamics in lifelogging and point to future work.

Keystroke Dynamics
In 2009 Stephen Wolfram reported that he had been using a keystroke logger that collected a record of his every keystroke for the previous 22 years 3 .This was in the form of the key pressed and the date and time of pressing.By 2012 this had grown to be a record of 100 million keystrokes 4 and from all this he was able to generate some interesting visualisations on usage and on his life, such as the one shown in Figure 2.This shows his rather interesting work patterns -he basically works all day, and evening, stopping at about 3AM before resuming at about 10AM the following day with a break of a couple of hours, sometimes, in the evening for dinner.We can also see his various trips where he switched to local timezones such as his Summer of 2008 spent in Europe and there are other interesting facts like that the average fraction of keys he types that are backspaces has consistently been about 7%.While this kind of raw visualisation and analysis may be interesting, it is only when we add detailed timing information, like recording keystroke times to the nearest millisecond so that we can look at inter-keystroke times, i.e. the time needed to type two or more adjacent characters, that we can get different kinds of insights into participants.
The original application for keystroke dynamics with accurate timing information was as a form of user authentication and work in this area goes back over 4 decades, from 1980 onwards and with regular re-visits to the topic [1,2,8].The security application for keystroke dynamics is based on the premise that each of us have our own unique timing information as we interact with GUIs and that includes the timings of our keystrokes, our mouse movements, and our mouse clicks [4].An advantage of using keystroke dynamics for security and authentication would be that we would never need to remember passwords, and passwords could never be hacked because they would be replaced by our keystroke dynamics.However the way authentication for access to our computer systems has developed over the last half-century is that they present as tests to be overcome at the point of entry, similar to the way a passport is used at an airport.Keystroke dynamics take some time for baseline timing patterns to emerge and become established and thus they are not useful for authentication at point of entry, which is why we do not see it in common use today.
Keystroke logging has had other more successful applications including identifying different kinds of author writing strategies and understanding cognitive processes [12].The premise here is that we establish a baseline for our keystroke timing information gathered over a long period and at any given period during the day we can compare the current dynamics with the baseline to see if we are typing faster, or slower, perhaps indicating that we are in full creative flow or that we are pondering our thoughts as we write.This also exploits pause location as we type and whether pauses occur between words, between sentences or even between paragraphs and what insights into the author's thinking can be gleaned from such pauses [11].
Keystroke timing information has also been used for measuring stress [17] where the authors found that it is possible to classify cognitive and physical stress conditions relative to non-stress conditions based on keystroke and text features.It has also been used for emotion detection where [9] provides a review of almost a dozen published papers addressing this specific topic, and that review was from 2013.
What previous work shows is that keystroke dynamics can provide untapped insights into our behaviour in a way which is non-intrusive, requires no investment in hardware, uses up a miniscule amount of computer resources yet this is a data source that we have largely ignored to date.In this paper we argue for keystroke logging as a data source for lifelogging and we illustrate our case using keystroke information collected from 4 participants over more than 6 months.

Collecting Keystroke Data
For collecting keystroke dynamics we used Loggerman [5] a comprehensive logging tool which can capture many aspects of our computer usage including keyboard, mouse and interface actions.This information is gathered ambiently and stored on the local computer.For keystrokes, Loggerman can record complete words typed by the participant where words are separated by whitespace.When the participant is typing a password, recording is automatically disabled.Once installed, Loggerman can simply record information to log files and uses a minuscule amount of CPU time.The status of Loggerman, i.e. whether it is recording or if it has been paused by the participant, appears on the computer GUI as an icon on the menu bar. Figure 2 shows two versions of the menu bar on an Apple Macbook with Loggerman enabled and with Loggerman paused by the participant.From an examination of Loggerman files across several participants we see that participants regularly make use of autocomplete, they make typing errors and then use backspace or they re-position their cursor with arrow keys to change a previously mistyped word or to fix a spelling error.Thus the amount of fully-typed and correctly-typed words in Loggerman's word file is much less than we anticipated.The keystroke dynamics associated with such instances of cursor navigation and re-positioning will not be reflective of the ideal creative flow that we would like when we type and thus the keystroke timing information for the overall logging period will have been "polluted" by this necessity of correcting typing errors or of re-phrasing.That is unfortunate, and some participants may have more of this than others and even a given participants may have periods of more or less of the "flow" experience, as we discussed earlier when presenting keystroke logging for investigating writing strategies.
To illustrate the potential of keyboard dynamics in lifelogging we gathered information using Loggerman from 4 participants covering 1 January 2020 to 24 July 2020 (206 days) and we present an analysis of data from those subjects.Table 1 shows the amount of data generated.The number of days logged varies per participant because participants might disable logging and forget to resume it, and we also see an almost tenfold variation in the number of keystrokes typed on average per day between participants 1 and 4, with 2 and 3 in between.When we analysed the log files from across participants we found that many of the most frequently used characters are special characters like punctuation marks and numbers as well as keys for cursor navigation.For the purpose of our timing analysis we will not consider these special characters since they are not part of normal typing flow and so we consider only alphabetic characters A to Z.This reduces the number of keystrokes by almost half, so for participant 1 the total of 2,174,539 keystrokes reduces to 1,220,850 typed characters.For timing purposes we treat uppercase and lowercase as equal.A rationale for doing this because it reduces the number of possible 2-character strings (bigrams) we work with to 26 × 26 = 676 possible combinations.
As mentioned earlier, a lifelog's usefulness increases when there are multiple sources of logged data gathered by the participant.Logging data on mood, emotion, stress or writing style at a given time were beyond the scope of this work which focuses on keystroke dynamics only however in addition to keystroke logging we also gathered information on participants' sleep.There are a range of sleep tracking devices available off-the-shelf [14] and we used the Ōura ring [10].This is a smart ring with in-built infrared LEDs, NTC temperature sensors, an accelerometer, and a gyroscope all wrapped into a ring form factor which gathers data for up to 7 days between charges.During sleep it measures heart rate including heart rate variability, body temperature, and movement from which it can calculate respiration rate.From its raw data it computes a daily activity score, average METs, walking equivalent, a readiness score and for sleep it records bedtime, awake time, sleep efficiency, total sleep and several other metrics, including an overall sleep score.From among all these options we use the overall score, a measure in the range 0 to 100 calculated using a proprietary algorithm which is a function of total sleep, sleep efficiency, restfulness, REM and deep sleep, latency and timing.Ōura's interpretation of the sleep score is that if it is 85 or higher that corresponds to an excellent night of sleep, 70-84 is a good night of sleep while under 70 means the participant should pay attention to their sleep.
Our participants used a sleep logger for most of the 206 days of logging and for nights when the logger was not used we used a simple data imputation to fill the gap.

Data Analysis
In 2013 Peter Norvig published the results of his computation of letter, word and ngram frequencies drawn from the Google Books collection of 743,842,922,321 word occurrences in the English language 5 .In this he found the top 5 most frequently occurring bigrams in English are TH, HE, IN, ER and AN, though some of the possible 676 bigrams will never or almost never appear, such as JT, QW or ZB.In our first analysis we focus on participant 1 as s/he gathered the largest volume of log data.From among the 369,467 individual words typed over 206 days, the top 10 most frequently occurring bigrams for all 4 participants are shown in Table 2, along with the top 10 as found from Norvig's analysis.This shows us that there is very little overlap among the top 10 actual bigrams typed by different participants but we are not interested in the actual bigrams typed but in the timing of that typing.The distributions of timing information for each of the 200 most frequent bigrams over the 206 day logging period for participant 1 is shown as Figure 3.These individual graphs are too small to see any detail, but it is clear that the actual timing patterns for bigrams vary quite a lot among these top 200.For these graphs and the subsequent ones in this paper, we do not include inter-character timing gaps greater than 1,000ms and the graphs show the time taken for instances of each bigram plotted left to right from 1 January to 24 July.In Figure 4 shows the frequencies of occurrence for those top-200 most frequently used bigrams from participant 1 highlighting that there are a small number of very frequently occurring bigrams and then it tails off, in a Zipfian-like manner.This pattern is repeated for our other participants.When we look at how mean typing speeds for these top-200 bigrams from across the 206 days vary compared to the overall mean for participant 1, which is 204ms, there are a very small number of bigrams up to 150ms faster than the average and a small number of bigrams up to 150ms slower than the average.Most of the rest of these, approx 80%, are between 75ms faster and 75ms slower than the average.Thus a clustering of approximately 80% of bigram mean timings are within an overall range of only 150ms as shown in Figure 5.The mean and standard deviations for some bigram timings for participant 1 are shown in Figure 6.The fastest average of the 676 bigrams is OU with a mean time of 58ms but with very large standard deviation of 83.3 while the slowest from among the top 200 bigrams is EH with a mean time of 358ms and standard deviation of 191.The bigram YO (ranked 179th most frequently occurring) with a mean time of 283ms and standard deviation of 81 has an interesting characteristic of never, ever being faster than about 200ms.This can only be explained as a quirky characteristic of the keyboard typing of participant 1 and we compare her/him to the others later.
For participant 1 we found that some of the bigrams (XV and VV) have an average timing which is over 500ms slower than the overall average, indicating that this participant has trouble finding the XV and VV character combinations, while some other bigrams like EI and IN are 162ms and 151ms faster than the average.This might be due to the fingers usually used by her/him to type these particular character combinations.We would expect that when using the middle and index fingers consecutively on keys that are adjacent and on the same row of the keyboard this would be faster to type than, say, using the little and then the index finger on two keys that are on a lower and then a higher row of the keyboard.On checking with the participant as to which fingers s/he uses to type the fastest of the we find that it is indeed the middle and index fingers for keys which are on the same row of the keyboard.Thus some of the timing characteristics may be explained by keyboard layout and the particular fingering that a subject will use for pressing different keys.
We also discovered a strange banding effect for this participant's timing information which is shown in Figure 7.For bigrams AS (ranked 31 th ), IM (90 th ), EW (99 th ), PL (124 th ), GH (146 th ) and DC (188 th ) there is a lower (faster) band of rapidly typed characters spanning right across the 207 day logging period with a gap in timing before the more regular characteristic pattern of dense occurrences leading to more scattered occurrences as we approach 1,000ms.Our only explanation for this is that it is to do with the rapid typing of a regularly used word among this participant's wordlist but this needs to be investigated further We now look at consistency of bigram timing characteristics for participants across each day of their logging period.If we take the top-200 most frequent bigrams and rank order them by their mean speed for each of their logging days and then correlate the bigram rankings for each day, the average pairwise correlation for participant 1 is 0.6262.The  3, though when we get to the top 5 bigrams the correlation drops to 0.7937 which is explained by the fact that the top-5 mean fastest bigrams may vary from day to day.This is highlighted in our comparison to Peter Norvig's analysis at the start of this section.Table 3 also shows the same analysis applied to the other 3 participants and from this we see the others ave a much lower correlation among the bigrams they have typed fastest.Also included in Table 3 and taken from Table 1 earlier, is the average number of keystrokes typed by each participant per day and we can clearly see that the more a participant types, the greater the consistency of their typing speeds.Putting this in other words, no matter what day it is, participant 1 has almost the same ordering of her/his ranked bigram timings whereas for the others, that ordering will vary more, probably because they type less than participant 1.As mentioned previously, a lifelog becomes most useful when there are multiple sources of data which are then cross-referenced to gain insights into the participant's life.We saw in section 2 how keystroke dynamics has been used for measuring stress [9], emo-tion [9] and even our level of creative flow when writing [11].Using the sleep score data gathered by the Ōura ring, we explored whether sleep score correlates with bigram timing for any or all of our top-200 bigrams.Mean daily timing data for the bigram TH had a +0.209 correlation with sleep score from the previous night while mean daily timing data for CV had a correlation of -0.18.The average of these bigram correlations with sleep score was +0.014 which leads us to conclude that there is no correlation between daily typing speed of any bigrams and sleep score, meaning, in turn, that participant 1 is consistent in typing speed, even when tired from poor sleep the previous night.When we applied this to other participants we found the same result.Perhaps if we explored windowing sleep score as a moving average over a number of days, or used other metrics to measure fatigue then that might correlate with timing information.
Another possibility is that bigram timing information might vary during the day, differing from morning to evening as fatigue effects might alternate with bursts of energy or bursts of enthusiasm or as the participant's level of stress or emotion or creativity might vary but that would require a more comprehensive lifelog.This goes back to the point we made in the introduction about the best kind of lifelog being a multimodal artifact, a fusion across multiple, diverse information sources.The exercise reported in this paper has served to illustrate the possibilities that keystroke dynamics have as one of those information sources.

Conclusions
In this paper we present the case for greater use of keystroke dynamics in lifelogging.We report on several previous applications for keystroke dynamics and we describe how we used a tool called Loggerman to gather keystroke data for 4 participants over more then 6 months.We are particularly interested in the timing information associated with keystrokes ad we showed how timing information between bigram keystrokes can vary for the same participant across different days.We also showed how the relative speeds with which bigrams are typed varies hugely for the same participant and also across different participants, showing how useful keystroke dynamics can be for security and authentication applications.
Keystroke dynamics has been shown to correlate with stress, fatigue and writing style and in this preliminary analysis we explored whether keystroke timing was correlated with fatigue, as measured by sleep score from the previous night.Unfortunately we found no correlation between these suggesting that a simple sleep score is insufficient to measure participant fatigue and that we need more fine-grained measures which would allow levels of fatigue which vary throughout the day, to be measured.
For future work there are a range of ways in which data from keystroke dynamics could be used as part of a lifelog, especially to gain insights into the more complex cognitive processes in which we engage each day.Keystroke timing information has been shown to reveal writing strategies on the conventional keyboard/screen/mouse setup but it would be interesting to explore keystroke dynamics on mobile devices and see how that correlates with stress, cognitive load from multi-tasking, fatigue and distraction.

Fig. 3 .
Fig. 3. Timing intervals over 206 days for participant 1 for each of the top-200 bigrams ranked by mean overall speed

Table 1 .
Keystroke information for our 4 participantsTotal Keystrokes No of logged days Average Keystrokes/day

Table 2 .
10 most frequently used bigrams for participants

Table 3 .
Correlation among top bigrams ranked by frequency for logging period