Habits and Affects : Learning by an Associative Two-Process

Robert Lowe

doi:10.3390/IS4SI-2017-04113

Abstract:

In animal learning theory, the notion of habits is frequently employed to describe instrumental behaviour that is (among others): inflexible (i.e. slow to change), unconscious, insensitive to reinforcer devaluation (Dickinson 1985, Seger & Spiering 2011). It has also been suggested that learning using reinforcement learning algorithms somewhat reflects a transition from affect-based to more habit-based behaviour (Seger & Spiering 2011) where dual memory systems for affective working memory and standard (e.g. spatial) working memory systems exist (Davidson & Irwin 1999, Watanabe et al. 2007).

Associative Two-Process theory has been proposed to explain phenomena emergent from differential outcomes training. In this procedure, animals (sometimes humans) are presented with stimuli/objects that uniquely identify differential outcomes, e.g. a circle stimulus precedes the presentation of a food outcome, a square stimulus precedes the presentation of a toy outcome. Outcomes are, in turn, mitigated by specific responses, e.g. press the right button to obtain the food, press the left button to obtain the toy. Manipulating these stimuli, response, outcome contingencies reveals the two types of memory, i.e. one that concerns ‘standard’ working memory of stimulus-response associations, the other that concerns ‘prospective’ memory, that stimulus-expectation-response follows in a sequence.

The neural dynamic relationship between the purported dual memory structures may vary depending on the stage of learning at which the animal / human (agent) has arrived at. Previously it has been suggested (Lowe et al. 2014), and neural-computationally demonstrated, that a working memory route is critical in initial learning trials where the agent is presented sequentially with a given stimulus, action/behavioural options, and finally an outcome (e.g. rewarding stimulus or absence thereof). Subsequent trials lead to a dominance of affective (or otherwise prospective) memory that effectively scaffolds the learning of the outcome-achieving stimulus-response rules under conditions of relative uncertainty. Finally, during later stages of learning more ‘habitual’ responding may occur where the retrospective route becomes dominant and ‘overshadows’ the prospective memory.

In neural anatomical terms, candidate structures for implementing prospective memory include the orbitofrontal cortex (OFC), which is considered to enable fast, flexible and context-based learning (particularly important in studies of reversal learning, e.g. Delameter 2007). This is in contrast to the amygdala, which is considered less flexible, i.e. resistant to unlearning, but, nevertheless, critical to learning valuations of stimuli (Schoenbaum et al, 2007). Furthermore, the interplay between the basolateral division of the amygdala (BLA) and OFC may be crucial in differential reward evaluation (Ramirez and Savage, 2007). Passingham and Wise (2012) have suggested that medial prefrontal cortex (PFC) has a critical role in encoding outcome-contingent choice, whereas Watanabe et al (2007) have provided evidence for the lateral PFC integrating activation inputs from ‘retrospective’ (working memory) areas such as dorsal PFC and ‘prospective’ (outcome expectant) areas such as OFC and medial PFC.

A perspective of Urcuioli (2005, 2013) is that outcome expectancies (from prospective memory) provide a means to effectively classify stimuli. Action selection can then be simplified through exploiting affordances of the subset of those actions already associated with the outcome expectancy classes. This is a reason why participants under certain forms of differential outcomes training can immediately select the unique action that leads to the desired outcome even though the stimulus-action (response) contingency has previously not been experienced: Subjects have already classified the stimuli according to a given outcome expectancy previously associated with an action.

In this work, I discuss the associative two-process model in relation to (standard) working memory and ‘affective working memory’ (Watanabe et al. 2007) as providing a means to classify stimuli. I refer to a number of animal learning paradigms that demonstrate the potential for reward and reward omission anticipation to be associated with reward-promoting behaviour (cf. Overmier & Lawry 1979, Kruse & Overmier 1982, Urcuioli 2013, Lowe et al. 2016, Lowe & Billing 2017) and neural computational aspects of the interplay of affective (prospective) and working (retrospective) memory that may yield more habitual behaviour. I show that, within an associative two-process context, habits can also be understood in terms of affective working memory – specifically in relation to reward acquisition expectation and reward omission expectation. Habits, in this context are considered behaviours that are inflexibly selected for in spite of reinforcer devaluation and their rigidity reflects the certainty / uncertainty of a particular rewarding outcome.

I discuss the implications for such learning of habits and affective mediations of behaviour particularly regarding memory and clinical conditions (e.g. alzheimer’s) and learning children. This may be informing of new digitized solutions for intervention approaches with senior citizens and pedagogy in relation to children development.

References

Dickinson, A. (1985). Actions and habits: the development of behavioural autonomy. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 308(1135), 67-78.

Davidson, R.J and Irwin, W. (1999). The functional neuroanatomy of emotion and affective style. Trends in Cognitive Neuroscience, 3: 11-21. 

Delamater, A.R. (2007). The role of the orbitofrontal cortex in sensory-specific encoding of associations in pavlovian and instrumental conditioning. Annals of the New York Academy of Sciences, 1121(1):152–173

Kruse, J. M., and Overmier, J. B. (1982). Anticipation of reward omission as a cue for choice behavior. Learning and Motivation, 13, 505–525. 

Lowe, R., Sandamirskaya, Y. and Billing, E. (2014). The actor - differential outcomes critic: A neural dynamic model of prospective overshadowing of retrospective action control. The Fourth Joint IEEE Conference on Development and Learning and on Epigenetic Robotics, pp. 440–447.

Lowe, R., Almer, A., Lindblad, G., Gander, P., Michael, J., Vesper, C. (2016) Minimalist social-affective value for use in joint action: A neural-computational hypothesis. Frontiers in Computational Neuroscience, 10(88).

Lowe, R. and Billing, E. (2017) Affective-Associative Two-Process theory: A neural network investigation of adaptive behaviour in differential outcomes training, Adaptive Behavior, 25 (1), 5-23

Overmier, J. B., & Lawry, J.A. (1979). Pavlovian conditioning and the mediation of behavior. The Psychology of Learning and Motivation, 13, 1–55.

Passingham, R. and Wise, S. (2012). The neurobiology of the prefrontal cortex: anatomy, evolution, and the origin of insight, vol 50. Oxford University Press.

Ramirez, D. and Savage, L. (2007). Differential involvement of the basolateral amygdala, orbitofrontal cortex, and nucleus accumbens core in the acquisition and use of reward expectancies. Behavioral neuroscience, 121(5):896–906.

Schoenbaum, G., Saddoris, M. and Stalnaker, T. (2007) Reconciling the roles of orbitofrontal cortex in reversal learning and the encoding of outcome expectancies. Annals of the New York Academy of Science, 1121:320–335.

Seger, C. A. and Spiering, B. J. (2011). A critical review of habit learning and the basal ganglia. Frontiers in systems neuroscience, 5.

Urcuioli, P.J. (2005). Behavioral and associative effects of differential outcomes in discriminating learning. Learning and Behavior, 33(1):1–21.

Urcuioli, P. (2013). Stimulus control and stimulus class formation. In Madden, G. J., Dube, W. V., Hackenberg, T. D., Hanley, G. P., & Lattal, K. A. (eds), APA Handbook of Behavior Analysis (Vol. 1, pp. 361–386). Washington, DC: American Psychological Association.

Watanabe, M., Hikosaka, K., Sakagami, M., & Shirakawa, S. (2007). Reward expectancy-related prefrontal neuronal activities: Are they neural substrates of ‘‘affective’’ working memory? Cortex, 43, 53–64.