How Dementia Affects Conversation: Building a More Accessible Conversational AI

Diving into the Literature – What do we know?

source

We all (roughly) know how to naturally converse with one another. This is mostly subconscious and only really noticeable if an interaction skews from what most consider “normal.” In the majority of cases, these are just minor differences, such as someone speaking a little too close or interrupting more often than usual.

However, more significant conversational differences can start to occur when parts of the brain begin to decline in performance.

Contents

Introduction
Overview of Dementia
Papers Covered
Motivation – Why Speech?
Datasets
Models
Important Language Features
Conclusion

Introduction

I am currently working towards creating a more natural conversational agent (such as Siri, Alexa, etc…) for those with cognitive impairments that can potentially benefit the most from these systems. Currently we have to adapt how we speak to these systems and have to know exactly how to ask for certain functions. For those that struggle to adapt, I hope to lower some of these barriers so that these people can live more independently for longer. If you want to read more about the overall project then I discussed it in more detail in an interview here.

To kick off this project with Wallscope and The Data Lab, I first investigated some of the research centered on recreating natural conversation with conversational agents. This research all related to a healthy population, but a question arose: do some of these phenomena vary when conversing with those that have forms of cognitive impairment?

In my previous article, I covered two papers that discuss end-of-turn prediction. They created brilliant models to predict when someone has finished their turn to replace models that just wait for a duration of silence.

If someone with Dementia takes a little longer to remember the word(s) they’re looking for, the silence threshold models used in current systems will interrupt them. I suspect the research models would also perform worse than with a healthy population, so I’m collecting a corpus to investigate this.

As my ultimate aim is to make conversational agents more naturally usable for those with dementia, I’ll dive into some of the related research in this article.

Overview of Dementia

I am by no means a dementia expert so this information was all collected from an amazing series of videos by the Alzheimer’s Society.

Their Website

Dementia is not a disease but the name for a group of symptoms that commonly include problems with:

  • Memory
  • Thinking
  • Problem Solving
  • Language
  • Visual Perception

For people with dementia, these symptoms have progressed enough to affect daily life and are not a natural part of aging, as they’re caused by different diseases (I highlight some of them below).

All of these diseases cause the loss of nerve cells, and this gets gradually worse over time, as these nerve cells cannot be replaced.

As more and more cells die, the brain shrinks (atrophies) and symptoms sharpen. Which symptoms set in first depends on which part of the brain atrophies—so people are impacted differently.

source — you can see the black areas expanding as nerve cells die and atrophy progresses.

For example, if the occipital lobe begins to decline, then visual symptoms would progress, whereas losing the temporal lobe would cause language problems…

Other common symptoms impact:

  • Day-to-day memory
  • Concentration
  • Organization
  • Planning
  • Language
  • Visual Perception
  • Mood

There is currently no cure…

Before moving on to cover recent research surrounding language problems, it’s important to not that most research is disease-specific. Therefore, I’lll briefly cover the four types of Dementia.

All of this information again comes from the series of videos created by the Alzheimer’s Society.

Alzheimer’s Disease

The most common type of dementia is Alzheimer’s Disease (AD), and for this reason, it’s also the most understood (you’ll notice this in the research).

A healthy brain contains proteins (two of which are called amyloid and tau), but if the brain starts to function abnormally, these proteins form abnormal deposits called plaques and tangles.

source

These plaques and tangles damage nerve cells, which causes them to die and the brain to shrink, as shown above.

The hippocampus is usually the first area of the brain to decline in performance when someone has AD. This is unfortunately where memories are formed, so people will often forget what they have just done and may therefore repeat themselves in conversation.

Recent memories are lost first, whereas childhood memories can still be retrieved as they depend less on the hippocampus. Additionally, emotions can usually be recalled as the amygdala is still intact, whereas the facts surrounding those emotions can be lost.

AD gradually progresses, so symptoms worsen and become more numerous slowly over time.

Vascular Dementia

The second most common type of dementia is vascular dementia, which is caused by problems with the brain’s blood supply.

Nerve cells need oxygen and nutrients to survive, so without them they become damaged and die. Therefore, when blood supply is interrupted by a blockage or leak, significant damage can be caused.

Like with AD, symptoms depend on which parts of the brain are impacted. When the parts damaged are responsible for memory, thinking, or language, the person will have problems remembering, thinking or speaking.

source

Vascular dementia can be caused by strokes. Sometimes one major stroke can cause it, but in other cases a person may suffer from multiple smaller strokes that gradually cause damage.

The most common cause of vascular dementia is small-vessel disease, which gradually narrows the vessels in the brain. As the narrowing continues and spreads, more of the brain gets damaged.

Vascular dementia can therefore have a gradual progression like AD or, if caused by strokes, a step-like progression with symptoms worsening after each stroke.

Dementia with Lewy Bodies

Closely related to AD, but less common, is a type of dementia called dementia with Lewy bodies.

Lewy bodies are tiny clumps of protein that develop inside nerve cells in the brain. This prevents communication between cells, which causes them to die.

source

Researchers have not yet identified why Lewy bodies form or how. We do know, however, that they can form in any part of the brain, which, again, leads to varying symptoms.

People can have problems with concentration, movement, alertness, and can even have visual hallucinations. These hallucinations are often distressing and lead to sleep problems.

Dementia with Lewy bodies progresses gradually and spreads as more nerve cells get damaged, so memory is always impacted eventually.

Frontotemporal dementia

The last type of dementia I’ll cover is frontotemporal dementia (FTD), which is a range of conditions in which cells in the frontal and temporal lobes of the brain are damaged.

source

FTD is again a less common type of dementia but is surprisingly more likely to effect younger people (below 65).

The frontal and temporal lobes of the brain control behavior, emotion, and language, and symptoms occur in the opposite order depending on which lobe is impacted first.

The frontal lobe is usually the first to decline in performance, so changes begin to show through a person’s personality, behavior, and inhibitions.

Alternatively, when the temporal lobe is impacted first, a person will struggle with language. For example, they may struggle to find the right word.

FTD is thought to occur when proteins such as tau build up in nerve cells, but unlike the other causes, this is likely hereditary.

Eventually as FTD progresses, symptoms of frontal and temporal damage overlap, and both occur.

Papers Covered

That overview of dementia was fairly in depth, so we should now have a common foundation for this article and all subsequent articles.

As we now know, difficulty with language is a common symptom of dementia, so in order to understand how it changes, I’ll cover four papers that investigate this. These include the following:

[1]

A Speech Recognition Tool for Early Detection of Alzheimer’s Disease by Brianna Marlene Broderick, Si Long Tou and Emily Mower Provost

[2]

A Method for Analysis of Patient Speech in Dialogue for Dementia Detection by Saturnino Luz, Sofia de la Fuente and Pierre Albert

[3]

Speech Processing for Early Alzheimer Disease Diagnosis: Machine Learning Based Approach by Randa Ban Ammar and Yassine Ben Ayed

[4]

Detecting Cognitive Impairments by Agreeing on Interpretations of Linguistic Features by Zining Zhu, Jekaterina Novikova and Frank Rudzicz

Note: I will refer to each paper with their corresponding number from now on.

Motivation – Why Speech?

These four papers have a common motivation: to detect dementia in a more cost-effective and less intrusive manner.

These papers tend to focus on Alzheimer’s Disease (AD) because, as [3] mentions, 60–80% of dementia cases are caused by AD. I would add that this is likely why AD features most in existing datasets, also.

Current Detection Methods

[1] points out that dementia is relatively difficult to diagnose as progression and symptoms vary widely. The diagnostic processes are therefore complex, and dementia often goes undiagnosed because of this.

[2] explains that imaging (such as PET or MRI scans) and cerebrospinal fluid analysis can be used to detect AD very accurately, but these methods are expensive and extremely invasive. A lumbar puncture must be performed to collect cerebrospinal fluid, for example.

source – lumbar puncture aka “spinal tap”

[2] also points out that neuropsychological detection methods have been developed that can, to varying levels of accuracy, detect signs of AD. [1] adds that these often require repeat testing and are therefore time-consuming and cause additional stress and confusion to the patient.

As mentioned above, [1] argues that dementia often goes undiagnosed because of these flaws. [2] agrees that it would be beneficial to detect AD pathology long before someone is actually diagnosed in order to implement secondary prevention.

Will Speech Analysis Help?

As repeatedly mentioned in the overview of dementia above, language is known to be impacted through various signs such as struggles with word-finding, understanding difficulties, and repetition. [3] points out that language relies heavily on memory, and for this reason, one of the earliest signs of AD may be in a person’s speech.

source

[2] reinforces this point by highlighting the fact that in order to communicate successfully, a person must be able to perform complex decision making, strategy planning, consequence foresight, and problem solving. These are all impaired as dementia progresses.

Practically, [2] states that speech is easy to acquire and elicit, so they (along with [1], [3], and [4]) propose that speech could be used to diagnose Dementia in a cost-effective, non-invasive, and timely manner.

To start investigating this, we need the data.

Datasets

As you can imagine, it isn’t easy to acquire suitable datasets to investigate this. For this reason [1], [3], and [4] used the same dataset from DementiaBank (a repository within TalkBank) called the Pitt Corpus. This corpus contains audio and transcriptions of people with AD and healthy elderly controls.

To elicit speech, participants (both groups) were asked to describe the Cookie Theft stimulus photo:

source

Some participants had multiple visits, so [1], [3], and [4] had audio and transcriptions for 223 control interviews and 234 AD interviews (these numbers differ slightly between them due to pre-processing, I expect).

[1] points out that the picture description task ensures the vocabulary and speech elicited is controlled around a context, but [2] wanted to investigate a different type of speech.

Instead of narrative or picture description speech, [2] used spontaneous conversational data from the Carolina Conversations Collection (CCC) to create their models.

The corpus contains 21 interviews with patients with AD and 17 dialogues with control patients. These control patients suffered from other conditions such as diabetes, heart problems, etc… None of them had any neuropsychological conditions, however.

The automatic detection of AD developed by [2] was the first use of low-level dialogue interaction data as a basis for AD detection on spontaneous spoken language.

Models

If I’m to build a more natural conversational system, then I must be aware of the noticeable differences in speech between those with dementia and healthy controls. What features inform the models in these papers the most should indicate exactly that.

[1] extracted features that are known to be impacted by AD (I run through the exact features in the next section, as that’s my primary interest). They collected many transcription-based features and acoustic features before using principal component analysis (PCA) to reduce the total number of features to train with. Using the selected features they trained a KNN & SVM to achieve an F1 of 0.73 and importantly, a recall of 0.83 as false negatives could be dangerous.

[2] decided to only rely on content-free features including speech rate, turn-taking patterns, and other parameters. They found that when they used speech rate and turn-taking patterns to train a Real AdaBoost algorithm, they achieved an accuracy of 86.5%, and adding more features reduced the number of false positives. They found that other models performed comparably well, but even though Real AdaDoost and decision trees achieved an accuracy of 86.5%, they say there’s still room for improvement.

One point to highlight about [2] is their high accuracy (comparable to the state-of-the-art) despite relying only on content-free features. Their model can therefore be used globally, as the features are not language-dependent like the more complex lexical, syntactic, and semantic features used by other models.

source

[3] ran feature extraction, feature selection, and then classification. There were many syntactic, semantic, and pragmatic features transcribed in the corpus. They tried three feature selection methods, namely: Information Gain, KNN, and SVM Recursive Feature Elimination. This feature selection step is particularly interesting for my project. Using the features selected by the KNN, their most accurate model was an SVM that achieved precision of 79%.

[4] introduces a completely different (and more interesting) approach than the other papers, as they build a Consensus Network (CN).

As [4] uses the same corpus as [1] and [3], there’s a point at which the only two ways to improve upon previous classifiers are to either add more data or calculate more features. Of course, both of those options have limits, so this is why [4] takes a novel approach.

They first split the extracted features into non-overlapping subsets and found that the three naturally occurring groups (acoustic, syntactic, and semantic) garnered the best results.

The 185 acoustic features, 117 syntactic features, and 31 semantic features (plus 80 POS features that were mainly semantic) were used to train three separate neural networks called “ePhysicians”:

[4]

Each ePhysician is a fully connected network with ten hidden layers, Leaky ReLU activations, and batch normalization. Both the classifier and discriminator were the same but without any hidden layers.

The output of each ePhysician was passed one-by-one into the discriminator (with noise), and it then tried to to tell the ePhysicians apart. This encourages the ePhysicians to output indistinguishable representations from each other (agree).

[4] indeed found that their CN, with the three naturally occurring and non-overlapping subsets of features, outperformed other models with a macro F1 of 0.7998. Additionally, [4] showed that the inclusion of noise and cooperative optimization did contribute to the performance.

In case of confusion, it’s important to reiterate that [2] used a different corpus.

Each paper, especially [4], describes their model in more detail, of course. I’m not primarily interested in the models themselves, as I don’t intend to diagnose dementia. My main focus in this article is to find out which features were used to train these models, as I’ll have to pay attention to the same features.

Important Language Features

In order for a conversational system to perform more naturally for those with cognitive impairments, how language changes must be investigated.

[4] sent all features to their ePhysicians so didn’t detail which features were most predictive. They did mention that pronoun-noun ratios were known to change, as those with cognitive impairments use more pronouns than nouns.

[2] interestingly achieved great results using just a person’s speech rate and turn-taking patterns. They did obtain less false positives by adding other features but stuck to content-free features, as mentioned. This means that their model does not depend on a specific language and can therefore be used on a global scale.

[1] extracted features that are known to be impacted by AD and additionally noted that patients’ vocabulary and semantic processing had declined.

[1] listed the following transcription-based features:

  • Lexical Richness
  • Utterance Length
  • Frequency of Filler Words
  • Frequency of Pronouns
  • Frequency of Verbs
  • Frequency of Adjectives
  • Frequency of Proper Nouns

and [1] listed the following acoustic features:

  • Word Finding Errors
  • Fluidity
  • Rhythm of Speech
  • Pause Frequency
  • Duration
  • Speech Rate
  • Articulation Rate

Brilliantly, [3] performed several feature selection methods upon the following features:

[3]

Upon all of these features, they implemented three feature selection methods to select the top eight features each: Information Gain, KNN, and SVM Recursive Feature Elimination (SVM-RFE).

They output the following:

[3]

Three features were selected by all three methods, suggesting that they’re highly predictive for detecting AD: Word Errors, Number of Prepositions, and Number of Repetitions.

It’s also important to restate that the most accurate model used the features selected by the KNN method.

Overall, we have many features identified in this section to pay attention to. In particular, however, (from both the four papers and the Alzheimer’s Society videos) we need to pay particular attention to:

  • Word Errors
  • Repetition
  • Pronoun-Noun Ratio
  • Number of Prepositions
  • Speech Rate
  • Pause Frequency

Conclusion

We’ve previously looked into the current research towards making conversational systems more natural, and we now have a relatively short list of features that must be handled if conversational systems are to perform fluidly, even if the user has a cognitive impairment like AD.

Of course, this isn’t an exhaustive list, but it’s a good place to start and points me in the right direction for what to work on next. Stay tuned!

Editor’s Note: Join Heartbeat on Slack and follow us on Twitter and LinkedIn for the all the latest content, news, and more in machine learning, mobile development, and where the two intersect.

https://medium.com/media/05616eaceabf5537ffbda5b6811c367c/href


How Dementia Affects Conversation: Building a More Accessible Conversational AI was originally published in Heartbeat on Medium, where people are continuing the conversation by highlighting and responding to this story.

Diving into the Literature - What do we know?

source

We all (roughly) know how to naturally converse with one another. This is mostly subconscious and only really noticeable if an interaction skews from what most consider “normal.” In the majority of cases, these are just minor differences, such as someone speaking a little too close or interrupting more often than usual.

However, more significant conversational differences can start to occur when parts of the brain begin to decline in performance.

Contents

Introduction
Overview of Dementia
Papers Covered
Motivation - Why Speech?
Datasets
Models
Important Language Features
Conclusion

Introduction

I am currently working towards creating a more natural conversational agent (such as Siri, Alexa, etc…) for those with cognitive impairments that can potentially benefit the most from these systems. Currently we have to adapt how we speak to these systems and have to know exactly how to ask for certain functions. For those that struggle to adapt, I hope to lower some of these barriers so that these people can live more independently for longer. If you want to read more about the overall project then I discussed it in more detail in an interview here.

To kick off this project with Wallscope and The Data Lab, I first investigated some of the research centered on recreating natural conversation with conversational agents. This research all related to a healthy population, but a question arose: do some of these phenomena vary when conversing with those that have forms of cognitive impairment?

In my previous article, I covered two papers that discuss end-of-turn prediction. They created brilliant models to predict when someone has finished their turn to replace models that just wait for a duration of silence.

If someone with Dementia takes a little longer to remember the word(s) they’re looking for, the silence threshold models used in current systems will interrupt them. I suspect the research models would also perform worse than with a healthy population, so I’m collecting a corpus to investigate this.

As my ultimate aim is to make conversational agents more naturally usable for those with dementia, I’ll dive into some of the related research in this article.

Overview of Dementia

I am by no means a dementia expert so this information was all collected from an amazing series of videos by the Alzheimer’s Society.

Their Website

Dementia is not a disease but the name for a group of symptoms that commonly include problems with:

  • Memory
  • Thinking
  • Problem Solving
  • Language
  • Visual Perception

For people with dementia, these symptoms have progressed enough to affect daily life and are not a natural part of aging, as they’re caused by different diseases (I highlight some of them below).

All of these diseases cause the loss of nerve cells, and this gets gradually worse over time, as these nerve cells cannot be replaced.

As more and more cells die, the brain shrinks (atrophies) and symptoms sharpen. Which symptoms set in first depends on which part of the brain atrophies—so people are impacted differently.

source — you can see the black areas expanding as nerve cells die and atrophy progresses.

For example, if the occipital lobe begins to decline, then visual symptoms would progress, whereas losing the temporal lobe would cause language problems…

Other common symptoms impact:

  • Day-to-day memory
  • Concentration
  • Organization
  • Planning
  • Language
  • Visual Perception
  • Mood
There is currently no cure…

Before moving on to cover recent research surrounding language problems, it’s important to not that most research is disease-specific. Therefore, I’lll briefly cover the four types of Dementia.

All of this information again comes from the series of videos created by the Alzheimer’s Society.

Alzheimer's Disease

The most common type of dementia is Alzheimer’s Disease (AD), and for this reason, it’s also the most understood (you’ll notice this in the research).

A healthy brain contains proteins (two of which are called amyloid and tau), but if the brain starts to function abnormally, these proteins form abnormal deposits called plaques and tangles.

source

These plaques and tangles damage nerve cells, which causes them to die and the brain to shrink, as shown above.

The hippocampus is usually the first area of the brain to decline in performance when someone has AD. This is unfortunately where memories are formed, so people will often forget what they have just done and may therefore repeat themselves in conversation.

Recent memories are lost first, whereas childhood memories can still be retrieved as they depend less on the hippocampus. Additionally, emotions can usually be recalled as the amygdala is still intact, whereas the facts surrounding those emotions can be lost.

AD gradually progresses, so symptoms worsen and become more numerous slowly over time.

Vascular Dementia

The second most common type of dementia is vascular dementia, which is caused by problems with the brain’s blood supply.

Nerve cells need oxygen and nutrients to survive, so without them they become damaged and die. Therefore, when blood supply is interrupted by a blockage or leak, significant damage can be caused.

Like with AD, symptoms depend on which parts of the brain are impacted. When the parts damaged are responsible for memory, thinking, or language, the person will have problems remembering, thinking or speaking.

source

Vascular dementia can be caused by strokes. Sometimes one major stroke can cause it, but in other cases a person may suffer from multiple smaller strokes that gradually cause damage.

The most common cause of vascular dementia is small-vessel disease, which gradually narrows the vessels in the brain. As the narrowing continues and spreads, more of the brain gets damaged.

Vascular dementia can therefore have a gradual progression like AD or, if caused by strokes, a step-like progression with symptoms worsening after each stroke.

Dementia with Lewy Bodies

Closely related to AD, but less common, is a type of dementia called dementia with Lewy bodies.

Lewy bodies are tiny clumps of protein that develop inside nerve cells in the brain. This prevents communication between cells, which causes them to die.

source

Researchers have not yet identified why Lewy bodies form or how. We do know, however, that they can form in any part of the brain, which, again, leads to varying symptoms.

People can have problems with concentration, movement, alertness, and can even have visual hallucinations. These hallucinations are often distressing and lead to sleep problems.

Dementia with Lewy bodies progresses gradually and spreads as more nerve cells get damaged, so memory is always impacted eventually.

Frontotemporal dementia

The last type of dementia I’ll cover is frontotemporal dementia (FTD), which is a range of conditions in which cells in the frontal and temporal lobes of the brain are damaged.

source

FTD is again a less common type of dementia but is surprisingly more likely to effect younger people (below 65).

The frontal and temporal lobes of the brain control behavior, emotion, and language, and symptoms occur in the opposite order depending on which lobe is impacted first.

The frontal lobe is usually the first to decline in performance, so changes begin to show through a person’s personality, behavior, and inhibitions.

Alternatively, when the temporal lobe is impacted first, a person will struggle with language. For example, they may struggle to find the right word.

FTD is thought to occur when proteins such as tau build up in nerve cells, but unlike the other causes, this is likely hereditary.

Eventually as FTD progresses, symptoms of frontal and temporal damage overlap, and both occur.

Papers Covered

That overview of dementia was fairly in depth, so we should now have a common foundation for this article and all subsequent articles.

As we now know, difficulty with language is a common symptom of dementia, so in order to understand how it changes, I’ll cover four papers that investigate this. These include the following:

[1]

A Speech Recognition Tool for Early Detection of Alzheimer’s Disease by Brianna Marlene Broderick, Si Long Tou and Emily Mower Provost

[2]

A Method for Analysis of Patient Speech in Dialogue for Dementia Detection by Saturnino Luz, Sofia de la Fuente and Pierre Albert

[3]

Speech Processing for Early Alzheimer Disease Diagnosis: Machine Learning Based Approach by Randa Ban Ammar and Yassine Ben Ayed

[4]

Detecting Cognitive Impairments by Agreeing on Interpretations of Linguistic Features by Zining Zhu, Jekaterina Novikova and Frank Rudzicz
Note: I will refer to each paper with their corresponding number from now on.

Motivation - Why Speech?

These four papers have a common motivation: to detect dementia in a more cost-effective and less intrusive manner.

These papers tend to focus on Alzheimer’s Disease (AD) because, as [3] mentions, 60–80% of dementia cases are caused by AD. I would add that this is likely why AD features most in existing datasets, also.

Current Detection Methods

[1] points out that dementia is relatively difficult to diagnose as progression and symptoms vary widely. The diagnostic processes are therefore complex, and dementia often goes undiagnosed because of this.

[2] explains that imaging (such as PET or MRI scans) and cerebrospinal fluid analysis can be used to detect AD very accurately, but these methods are expensive and extremely invasive. A lumbar puncture must be performed to collect cerebrospinal fluid, for example.

source - lumbar puncture aka “spinal tap”

[2] also points out that neuropsychological detection methods have been developed that can, to varying levels of accuracy, detect signs of AD. [1] adds that these often require repeat testing and are therefore time-consuming and cause additional stress and confusion to the patient.

As mentioned above, [1] argues that dementia often goes undiagnosed because of these flaws. [2] agrees that it would be beneficial to detect AD pathology long before someone is actually diagnosed in order to implement secondary prevention.

Will Speech Analysis Help?

As repeatedly mentioned in the overview of dementia above, language is known to be impacted through various signs such as struggles with word-finding, understanding difficulties, and repetition. [3] points out that language relies heavily on memory, and for this reason, one of the earliest signs of AD may be in a person’s speech.

source

[2] reinforces this point by highlighting the fact that in order to communicate successfully, a person must be able to perform complex decision making, strategy planning, consequence foresight, and problem solving. These are all impaired as dementia progresses.

Practically, [2] states that speech is easy to acquire and elicit, so they (along with [1], [3], and [4]) propose that speech could be used to diagnose Dementia in a cost-effective, non-invasive, and timely manner.

To start investigating this, we need the data.

Datasets

As you can imagine, it isn’t easy to acquire suitable datasets to investigate this. For this reason [1], [3], and [4] used the same dataset from DementiaBank (a repository within TalkBank) called the Pitt Corpus. This corpus contains audio and transcriptions of people with AD and healthy elderly controls.

To elicit speech, participants (both groups) were asked to describe the Cookie Theft stimulus photo:

source

Some participants had multiple visits, so [1], [3], and [4] had audio and transcriptions for 223 control interviews and 234 AD interviews (these numbers differ slightly between them due to pre-processing, I expect).

[1] points out that the picture description task ensures the vocabulary and speech elicited is controlled around a context, but [2] wanted to investigate a different type of speech.

Instead of narrative or picture description speech, [2] used spontaneous conversational data from the Carolina Conversations Collection (CCC) to create their models.

The corpus contains 21 interviews with patients with AD and 17 dialogues with control patients. These control patients suffered from other conditions such as diabetes, heart problems, etc… None of them had any neuropsychological conditions, however.

The automatic detection of AD developed by [2] was the first use of low-level dialogue interaction data as a basis for AD detection on spontaneous spoken language.

Models

If I’m to build a more natural conversational system, then I must be aware of the noticeable differences in speech between those with dementia and healthy controls. What features inform the models in these papers the most should indicate exactly that.

[1] extracted features that are known to be impacted by AD (I run through the exact features in the next section, as that’s my primary interest). They collected many transcription-based features and acoustic features before using principal component analysis (PCA) to reduce the total number of features to train with. Using the selected features they trained a KNN & SVM to achieve an F1 of 0.73 and importantly, a recall of 0.83 as false negatives could be dangerous.

[2] decided to only rely on content-free features including speech rate, turn-taking patterns, and other parameters. They found that when they used speech rate and turn-taking patterns to train a Real AdaBoost algorithm, they achieved an accuracy of 86.5%, and adding more features reduced the number of false positives. They found that other models performed comparably well, but even though Real AdaDoost and decision trees achieved an accuracy of 86.5%, they say there’s still room for improvement.

One point to highlight about [2] is their high accuracy (comparable to the state-of-the-art) despite relying only on content-free features. Their model can therefore be used globally, as the features are not language-dependent like the more complex lexical, syntactic, and semantic features used by other models.

source

[3] ran feature extraction, feature selection, and then classification. There were many syntactic, semantic, and pragmatic features transcribed in the corpus. They tried three feature selection methods, namely: Information Gain, KNN, and SVM Recursive Feature Elimination. This feature selection step is particularly interesting for my project. Using the features selected by the KNN, their most accurate model was an SVM that achieved precision of 79%.

[4] introduces a completely different (and more interesting) approach than the other papers, as they build a Consensus Network (CN).

As [4] uses the same corpus as [1] and [3], there’s a point at which the only two ways to improve upon previous classifiers are to either add more data or calculate more features. Of course, both of those options have limits, so this is why [4] takes a novel approach.

They first split the extracted features into non-overlapping subsets and found that the three naturally occurring groups (acoustic, syntactic, and semantic) garnered the best results.

The 185 acoustic features, 117 syntactic features, and 31 semantic features (plus 80 POS features that were mainly semantic) were used to train three separate neural networks called “ePhysicians”:

[4]

Each ePhysician is a fully connected network with ten hidden layers, Leaky ReLU activations, and batch normalization. Both the classifier and discriminator were the same but without any hidden layers.

The output of each ePhysician was passed one-by-one into the discriminator (with noise), and it then tried to to tell the ePhysicians apart. This encourages the ePhysicians to output indistinguishable representations from each other (agree).

[4] indeed found that their CN, with the three naturally occurring and non-overlapping subsets of features, outperformed other models with a macro F1 of 0.7998. Additionally, [4] showed that the inclusion of noise and cooperative optimization did contribute to the performance.

In case of confusion, it’s important to reiterate that [2] used a different corpus.

Each paper, especially [4], describes their model in more detail, of course. I’m not primarily interested in the models themselves, as I don’t intend to diagnose dementia. My main focus in this article is to find out which features were used to train these models, as I’ll have to pay attention to the same features.

Important Language Features

In order for a conversational system to perform more naturally for those with cognitive impairments, how language changes must be investigated.

[4] sent all features to their ePhysicians so didn’t detail which features were most predictive. They did mention that pronoun-noun ratios were known to change, as those with cognitive impairments use more pronouns than nouns.

[2] interestingly achieved great results using just a person’s speech rate and turn-taking patterns. They did obtain less false positives by adding other features but stuck to content-free features, as mentioned. This means that their model does not depend on a specific language and can therefore be used on a global scale.

[1] extracted features that are known to be impacted by AD and additionally noted that patients’ vocabulary and semantic processing had declined.

[1] listed the following transcription-based features:

  • Lexical Richness
  • Utterance Length
  • Frequency of Filler Words
  • Frequency of Pronouns
  • Frequency of Verbs
  • Frequency of Adjectives
  • Frequency of Proper Nouns

and [1] listed the following acoustic features:

  • Word Finding Errors
  • Fluidity
  • Rhythm of Speech
  • Pause Frequency
  • Duration
  • Speech Rate
  • Articulation Rate

Brilliantly, [3] performed several feature selection methods upon the following features:

[3]

Upon all of these features, they implemented three feature selection methods to select the top eight features each: Information Gain, KNN, and SVM Recursive Feature Elimination (SVM-RFE).

They output the following:

[3]

Three features were selected by all three methods, suggesting that they’re highly predictive for detecting AD: Word Errors, Number of Prepositions, and Number of Repetitions.

It’s also important to restate that the most accurate model used the features selected by the KNN method.

Overall, we have many features identified in this section to pay attention to. In particular, however, (from both the four papers and the Alzheimer’s Society videos) we need to pay particular attention to:

  • Word Errors
  • Repetition
  • Pronoun-Noun Ratio
  • Number of Prepositions
  • Speech Rate
  • Pause Frequency

Conclusion

We’ve previously looked into the current research towards making conversational systems more natural, and we now have a relatively short list of features that must be handled if conversational systems are to perform fluidly, even if the user has a cognitive impairment like AD.

Of course, this isn’t an exhaustive list, but it’s a good place to start and points me in the right direction for what to work on next. Stay tuned!

Editor’s Note: Join Heartbeat on Slack and follow us on Twitter and LinkedIn for the all the latest content, news, and more in machine learning, mobile development, and where the two intersect.


How Dementia Affects Conversation: Building a More Accessible Conversational AI was originally published in Heartbeat on Medium, where people are continuing the conversation by highlighting and responding to this story.