Blog Post

Why Stop at Voice?

Tom Campbell, Founder & President, FutureGrasp, LLC • Jun 19, 2018
Like what you read? CONTACT FutureGrasp for more information, speaking engagements and/or collaboration

Sitting in a coffee shop this morning, I heard a man at the table next to me take a call on his smartphone. He proceeded to speak so loudly that all the patrons had no difficulty hearing all the nuances of his small business. It was quite distracting, so I departed the comfortable environment earlier than I’d planned. Incidents like this make me wonder about the future of voice as the dominant control method for computers in the future. Voice is all the buzz (no pun intended) in Silicon Valley. Many technology leaders (Apple, Amazon, Microsoft, etc.) are betting that verbal commands will soon be the major means of communication for our numerous devices. But is verbal chatter the terminal point for our computer communications? Are we destined solely for the Star Trek Enterprise’s command prompts, to be stating the contemporary equivalent, even in public forums, of “Computer, take me to the Bridge”?

TODAY’S VOICE TECH

We naturally try to emulate our biological capabilities in the computer. Why wouldn’t we want to just talk to our devices and have them both accurately record our words, as well as chat with us like a human would? As a result of intense interest in deploying voice commands, tremendous advances have been made since the early days of voice recognition. Algorithms powered by artificial intelligence (AI) now can achieve up to 95 percent accuracy in understanding voices, on par with humans. [1] Depending how fast one can type vs. speak, voice commands offer the potential to be more efficient, and perhaps more accurate, than the output from our standard QWERTY keyboards. [2]

Top tech firms are embracing voice—Amazon has Alexa, Apple has Siri, and Microsoft has Cortana. We can talk to small receiving stations like Amazon’s Echo and Google’s Home and order just about anything…all by using our natural voice. Companies such as IPsoft have even captured a real human’s likeness and digitized it into an efficient enterprise software system to answer repeatedly, and to millions of customers simultaneously, mundane questions about insurance and human resource related matters. [3] Despite these advances and promises, there exist challenges with voice that may compromise its full deployment.

CHALLENGES WITH VOICE

Even we humans frequently misinterpret what each other is saying, so it’s no surprise that algorithms also make mistakes. Several particular linguistic challenges come to mind that could be onerous for a machine to overcome completely:

Noisy environments. The “Cocktail Party problem” is encountered frequently at conference receptions. All of us have a unique voice, even if it can change a little based upon our mood, fatigue or alcohol consumption. Nevertheless, throw in a bunch of other voices and background noise, and we can be challenged in interpreting the voice that we wish to focus upon, even if the speaker is right next to us. And that’s with a lifetime of learning; for an algorithm with little such experience, such a situation can be even more difficult. To cut through the cacophony of extraneous noise, one approach by Google is to incorporate both audio and visual cues using a machine learning algorithm trained upon thousands of YouTube videos. [4] Further advances may be needed to ensure no misinterpretation of a speaker in noisy environments.

Confusing command prompts. A recent story highlights the dangers of being completely trusting of a verbal command system. A family with an Amazon Echo system accidentally had an entire private dialogue of theirs emailed to a colleague several miles away – without the family’s knowledge. Per Amazon:

“Echo woke up due to a word in background conversation sounding like ‘Alexa’. Then, the subsequent conversation was heard as a ‘send message’ request. At which point, Alexa said out loud ‘To whom?’ At which point, the background conversation was interpreted as a name in the customer’s contact list. Alexa then asked out loud, ‘ contact name , right?’ Alexa then interpreted background conversation as ‘right’.” [5]

Better safeguards – for example, double-prompts, or additional codes and keywords – may be needed for such systems to ensure conversations meant to remain private are kept private.

Accents. Accents are one of the joys of language, but they also can be a burden for interpretation by an algorithm. Impressively, Amazon Alexa, Google Home, and Apple Siri test surprisingly well against speakers with multiple accents saying identical phrases, but there are still occasional failures. [6] Interpreting different accents may be compounded by other factors such as background noise.

Slang and dialects. Two beautiful aspects of language are slang words and dialects that can be almost entirely distinct from a mother tongue. For example, in German language ‘Umgangssprache’ (slang) can be an intense experience for even the fluent speaker. Throw in the differences among ‘Hochdeutsch’ (high German), ‘Plattdeutsch’ (low German), ‘Schweizerdeutsch’ (Swiss German), and other dialects even from village to nearby village, and one can become confused quickly. Prior to my post-doctoral studies in Germany, my wife and I spent four months intensively studying Hochdeutsch and felt ourselves reasonably fluent enough to go into stores and order items with no issues. Then we moved to a village 10 kilometers outside the city where we learned German and went to a local grocery store. It seemed they had a different word or pronunciation for almost every item that we had ordered easily back in the big city. The issue was a dialect, almost distinct from Hochdeutsch, that we had not been taught. Voice algorithms working in such environments would have to learn all such nuances to be proficient.

Language changing. Another great thing about languages is that they are never static. New words are constantly being introduced, especially in areas of technology. Words also change their root meaning, progressing through so-called ‘soft changes’: “Words tend to pick up different usages and even meanings over time, often very remarkably. The word terrific used to have a highly negative meaning—something that terrifies. Only recently has it become a positive term.” [7] Just as humans must adapt to the morphing of a language, algorithms will also have to constantly maintain updated dictionaries and be on the lookout for words that are used in different ways than commonly heard.

Foreign languages. There are almost 7,000 globally recognized languages, although one might consider that among this Tower of Babel, we really only use about 10 or 20 languages prominently around the world. [8] , [9] Roughly two billion people speak only three languages (Chinese--including 10 varieties, Spanish and English). [10] Nevertheless, to truly become the main, go-to, approach for computer commands, must our algorithms become the equivalent of the Star Wars C3PO, whose fictional robotic abilities could accommodate over six million galactic tongues? Would introducing more languages also compound interpretation among them all? For example, could a word that was pronounced by a Chinese Mandarin speaker be misinterpreted, and thus mistranslated, as Korean? When Amazon Alexa was debuted in France, Amazon had to consider the occasional English American word creeping into French conversation. More data is critical to understand the nuances of Romance languages with multiple versions of “you” and other honorifics. [11]

BEYOND VOICE

Aside from writing or typing, we have many non-verbal ways of communicating – facial expressions, eye motions, musculoskeletal movements, even our thoughts themselves can be harnessed by a computer to convey what we mean. Brain-computer interfaces (BCI) have been researched for decades in attempts to tease out our thoughts. Similar to audio commands, however, challenges with all of them exist. The ability of a computer algorithm to interpret our thoughts is a research avenue that has attracted substantial interest, especially in the last few years. With the goal of discussing technologies that could be more widely accepted in the near term, we describe below exclusively examples of ex-vivo (out of the body) capabilities. [12]

Facial expressions can be used to detect basic wishes. A Brazilian team is working to use facial expressions to control a wheelchair for individuals lacking other means of controlling mobility. “The camera can identify more than 70 facial points around the mouth, nose and eyes. By moving these points, it is possible to get simple commands, such as forward, backward, left or right and, most importantly, stop.” [13]

Tracking pupil oscillations (pupillometry) is another approach for non-verbal commands. Researchers in France and The Netherlands showed in 2016 a new BCI method via pupillometry. “In our method, participants covertly attend to one of several letters with oscillating brightness. Pupil size reflects the brightness of the selected letter, which allows us–with high accuracy and in real time–to determine which letter the participant intends to select. The performance of our method is comparable to the best covert-attention brain-computer interfaces to date, and has several advantages: no movement other than pupil-size change is required; no physical contact is required (i.e. no electrodes); it is easy to use; and it is reliable. Potential applications include: communication with totally locked-in patients, training of sustained attention, and ultra-secure password input.” [14]

Measurement of thoughts directly is also generating strong interest in the research community. In 2016, Elon Musk announced the creation of the company Neuralink , which seeks to create an external system to read the electrical signals from one’s mind and thus control computers and merge with artificial intelligence (AI). [15] “Neuralink is developing ultra-high bandwidth brain-machine interfaces to connect humans and computers.” The venture raised over $27 million in late 2017. [16] A wonderful posting in Wait But Why goes into detail on the underlying technology. [17]

In April 2018, a graduate student at MIT created a system that uses electrical signals from muscles to capture thoughts without speaking. Called AlterEgo, the system was tested on 10 subjects and achieved 92 percent accuracy. “AlterEgo is a closed-loop, non-invasive, wearable system that allows humans to converse in high-bandwidth natural language with machines, artificial intelligence assistants, services, and ‘movements—simply by vocalizing internally.” Potential applications include air traffic control and military communications. [18]

IMPLICATIONS & CONCLUSIONS

Living in the mountains of Colorado, I rarely hear much traffic, planes or other noises that are common in more populous urban areas. But every time I travel to a big city, I am struck by the almost constant cacophony of sounds. Noise pollution is a recognized health issue. [19] While talking to a computer may not be a major contributor to noise, every decibel can be a distraction, whether it be in a coffee shop, in a cube farm at work, or at home with your spouse and kids. Non-verbal communication approaches have the potential to create a quieter world.

Individuals who have lost the ability to speak or communicate otherwise certainly could leverage non-audio BCI systems. Paraplegics, autistics and other handicapped people (for example, those with a speech impediment) could benefit tremendously from a system that is well-trained to their thoughts. Being trapped in one’s mind with no means of communication must be torture for such disease or accident victims. BCIs have great potential to help them engage more with the world.

Perhaps in the future we may extend the use of BCIs into the science fiction realm of telepathy, also. Imagine being able to communicate non-verbally with your relatives or friends, not only while in the vicinity of each other, but perhaps many miles apart. We already do this now with texting, but using our thumbs still requires physical contact with a smartphone (and can be very dangerous when driving). Using a BCI such as those described above could enable more seamless communications in many life situations.

Ultimately, instead of having to answer a cell phone call verbally in a coffee shop, might we soon instead be able to communicate effectively without saying a word? The technologies to get us there are in labs and test computer algorithms now. It remains to be seen how soon such capabilities will be widely available. Sometime soon, we might instead think , “Computer, order me a triple latte.”

NOTES (all websites accessed June 19, 2018)


[1] K. Wiggers, “Qualcomm claims its on-device voice recognition is 95% accurate,” May 25, 2018, https://venturebeat.com/2018/05/25/qualcomm-claims-its-on-device-voice-recognition-is-95-accurate/

[2] QWERTY keyboards were initially designed to actually slow us down in typing so that the manual typewriter keys would not jam. It’s funny how legacy technologies remain in modern use even if they are sub-optimal.

[3] S. Kesler, “Inside the bizarre human job of being a face for artificial intelligence,” June 5, 2017, https://qz.com/996906/inside-the-bizarre-human-job-of-being-a-face-for-artificial-intelligence/

[4] L. Tung, “Google AI can pick out a single speaker in a crowd: Expect to see it in tons of products,” April 13, 2018, https://www.zdnet.com/article/google-ai-can-pick-out-a-single-speaker-in-a-crowd-expect-to-see-it-in-tons-of-products/

[5] S. Wolfson, “Amazon's Alexa recorded private conversation and sent it to random contact,” May 24, 2018, https://www.theguardian.com/technology/2018/may/24/amazon-alexa-recorded-conversation

[6] M. Calore, “Watch People With Accents Confuse the Hell Out of AI Assistants,” May 16, 2017, https://www.wired.com/2017/05/ai-assistants-accented-english/

[7] A. Myers-Stanford, “How artificial intelligence can teach itself slang,” June 7, 2017, https://www.futurity.org/deep-learning-language-1452822-2/

[8] S.R. Anderson, “How many languages are there in the world?,” 2010, Linguistic Society of America, https://www.linguisticsociety.org/content/how-many-languages-are-there-world

[9] “List of languages by number of native speakers,” https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

[10] J. Myers, “These are the world’s most spoken languages,” February 22, 2018, https://www.weforum.org/agenda/2018/02/chart-of-the-day-these-are-the-world-s-most-spoken-languages

[11] B. Barrett, “Inside Amazon's Painstaking Pursuit to Teach Alexa French,” June 13, 2018, https://www.wired.com/story/how-amazon-taught-alexa-to-speak-french/

[12] These examples are not intended to be an exhaustive review of the BCI field, but merely demonstrative of the art of the possible.

[13] A. Pasolini, “Wheelchair controlled by facial expressions to hit the market within 2 years,” May 19, 2016, https://newatlas.com/wheelchair-facial-commands/43206/

[14] Sebastiaan Mathôt, Jean-Baptiste Melmi, Lotje van der Linden, and Stefan Van der Stigchel, “The Mind-Writing Pupil: A Human-Computer Interface Based on Decoding of Covert Attention through Pupillometry,” PLoS One. 2016; 11(2): e0148805. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4743834/

[15] T. Lacoma, “Everything you need to know about Neuralink: Elon Musk’s brainy new venture,” November 7, 2017, https://www.digitaltrends.com/cool-tech/neuralink-elon-musk/

[17] T. Urban, “Neuralink and the Brain’s Magical Future,” April 20, 2017, https://waitbutwhy.com/2017/04/neuralink.html

[19] N. Lee, D. Anderson, J. Orwig, “Noise pollution is a bigger threat to your health than you may think, and Americans aren't taking it seriously,” January 26, 2018, http://www.businessinsider.com/noise-pollution-effects-human-hearing-health-quality-of-life-2018-1

Like what you read? CONTACT FutureGrasp for more information, speaking engagements and/or collaboration
Diplomacy turmoil in age of coronavirus
By Thomas A. Campbell, Andrew Hyde, Geoffrey M. Odlum, Ariel Ahram 11 Jun, 2020
The coronavirus pandemic has utterly transformed international diplomacy. Statecraft has long been driven by face-to-face interactions that expand through informal mechanisms, such as body language, spontaneous conversations, and an understanding of the views of other actors beyond rote statements of positions. Such meetings are now difficult if not impossible in the era of social distancing and work from home mandates. Meetings postponed or canceled altogether have slowed a range of negotiations while many important international conferences have also been delayed or gone virtual. Cybersecurity risks have amplified as unsecure networks are leveraged for conversations that would have normally occurred face to face. Diplomats must find new ways to work in this pandemic era, as well as what the current turmoil in operations may mean post-COVID19. The practice of diplomacy has always been intertwined with state-of-the-art methods of information and communications technology (ICT). For example, telegraphy revolutionized the practices of diplomacy by accelerating the speed of communication between embassies and the metropole. The internet has had similar revolutionary impact on the practices of statecraft and diplomacy. Web-enabled ICT multiplied the number of voices and interests involved in international policymaking; complicated international decision-making; reduced exclusive control of States in the process; accelerated and freed the dissemination of information, accurate or not, about any issue or event; and enabled traditional diplomatic services to be delivered faster and more cost-effectively, both to ones’ own citizens and government, and to those of other countries. Following the development of social media, multiple government agencies are now avid posters on Twitter and other platforms in their attempts to shape public discourse at home and abroad. _________ TURN TO TURMOIL The novel coronavirus has imposed suddenly a full-court-press for diplomats to take to internet ICT, but accompanying this rapid change are new communications stresses and nuances. The functionalities and efficiencies of diplomatic ICT - including phone, email, social media messaging, and virtual meetings – are now routinely in question. The limitations of video-conferencing and other ICT-driven modes of communication have long been documented . U.S. Foreign Service Officers (FSOs) are specifically trained in face-to-face interactions and talking to groups. Exclusive reliance upon teleconference platforms such as Zoom and WebEx affect multiple core capabilities of the FSO. It is difficult to detect and project desired body language, thus introducing challenges in convincing foreign interlocutors to share private thoughts and intentions over a monitor. Negotiations are both enabled and constrained online, thus pushing the dialogue in sometimes unexpected directions. Reduced or no travel, fewer in-person meetings, and overall, limited personal interactions compromise statecraft. Video engagement changes how information and intelligence is collected, thus impacting institutional memory. Diplomatic activity and initiatives have slowed considerably as many major meetings and conferences have gone online, with others outright canceled. US Government agencies have suspended many pre-pandemic priorities and pivoted almost entirely to focusing now on coping with the virus. Security in the White House has increased with individuals working near POTUS being given daily coronavirus checks , thus self-limiting the frequency of high-level meetings. Intelligence officers at the CIA are working shifts , such as three days on followed by three days off, to improve social distancing by reducing the given number of people in buildings at any given time. In New York City, itself a pandemic epicenter, all aspects of life have been severely compromised for over a month now, detrimentally affecting United Nations Headquarters operations and activities. For example, COVID-19 has prompted changes in the working methods of the UN General Assembly : “Physical distancing and stay-at-home restrictions mean representatives can no longer meet in person, including to vote on resolutions, which are now circulated under a process known as ‘silence procedure.’ Ambassadors are given a 72-hour window to consult their capitals. If they all agree on a resolution, it is passed. If not, the resolution is not adopted as the ‘silence’ has been broken.” Webinars and online meetings are the new norm, but they impact the large-scale and often slow-moving collaborative work of drafting and editing of diplomatic agreements and statements. These processes now often involve dozens of people all working remotely using a "track changes" mode of interaction. But direct contact and interaction often undergird these collaborations, as interlocutors know each other and can converse face to face before they move into the web environment. Going straight to the web without initial constructive in-person discussions in the best case merely compromises the speed of production; in the worst case it stymies it entirely. Cybersecurity risks increase when executing all diplomacy online. A singular reliance on ICT raises new questions about the reliability and vulnerability of networks to disruption or espionage; unpreparedness opens new cyber-attack vectors for hackers, both amateurs and State-sponsored professionals. Governments may not have the requisite cyberinfrastructure and trained employees to seamlessly continue diplomacy operations from home offices. Sharing of sensitive documents either cannot be done at all, or only in a limited manner. Diplomatic online sessions can be recorded and played back with deep analyses of verbatim words and body language – normally hurried notes taking and brief summarizations are what constitute the record for face-to-face conversations. Since different organizations and different countries can adopt disparate and non-interoperable technological approaches, ICT compatibility confusion throws additional barriers at smooth statecraft. Some diplomats may avoid negotiations altogether because of security concerns, thus compromising statecraft progress in general. _____________ ACCESSING THE ADVANTAGE State engagements must continue apace regardless of the pandemic. Negotiations of peace, non-virus global health issues, human rights, climate change, nuclear nonproliferation, and trade all must occur, in whatever limited way. Embassies need to find ways to provide emergency consular services to visiting or expatriate citizens in need without putting diplomats at risk of infection. Diplomats face challenges in assessing emerging political, economic, and social trends when prevented from face-to-face meetings or travel around a country. National elections scheduled during this period of social distancing may take place without the confidence-building presence of foreign election observers, increasing potential for corrupt or anti-democratic practices to alter election outcomes. Diplomatic professionals will have to consider how the new online reality impacts their ability to advance their national interests, to understand rapidly shifting geopolitical events, and to reach publics overseas. Key is to understand how diplomacy morphs into a new normal in the coming months and years, and to identify means that diplomats and senior policymakers can prepare. Tactical challenges are to identify best ways of doing business and to implement them in real time during the pandemic. Strategic challenges are to consider how diplomacy following the pandemic might inalterably change, what new opportunities are present, and how to position diplomats accordingly. Advantages and opportunities exist in this new online reality, nevertheless. Frictionless meeting logistics with accompanying financial savings for internet-only engagements; increased opportunities to hear directly from key players; younger diplomats shining with their greater digital skills – all these and more are coming to light in the pandemic era. As stated by the United Nations’ Under-Secretary General for Political and Peacebuilding Affairs, Rosemary DiCarlo , “Although we recognize that the limitations of the processes in which face-to-face meetings are restricted, the increased use of technology has the potential to create new opportunities [and] enhance the inclusivity of peace processes - for example, including the participation of women and young people.” Concomitant with this new reality is the increased potential to leverage big data, predictive analytics, and other digital technologies. Investments should be amplified in artificial intelligence (AI) to analyze all data that can be extracted on any given country, region, or topic. Predictive analytics using machine learning (and its subset, deep learning) can then be applied to offer novel insights for meeting preparedness and briefings to senior policymakers; such approaches offer new opportunities in optimized human-machine hybrid decision-making. AI can also be used to automatically assess treaty compliance, as well as to monitor and predict events before they occur . In-situ machine language translations and real-time content assessment for both verbal and written communiques are feasible now. Increased security of voting processes via blockchain holds great potential. Virtual reality avatars can be applied to replicate the feeling of sitting across a table from a counterpart. Ultimately, there is a wide range of technological applications and solutions that await adoption and deployment by forward-leaning diplomatic corps. However long the pandemic lasts, diplomacy has been forever changed. How diplomats respond to the recent step-change in personal interactions and resultant altered statecraft will dictate whether national objectives are met, whether treaties are negotiated, and whether normal operations return - or diplomacy turmoil continues. The opinions and characterizations in this piece are those of the authors and do not necessarily represent those of the U.S. Government. Thomas A. Campbell, Ph.D. (tom.campbell@futuregrasp.com) is Founder & CEO of FutureGrasp, an advisory group that works with organizations globally to identify policy and business opportunities from emerging and disruptive technologies, especially artificial intelligence. Previously, he was the first National Intelligence Officer for Technology with the National Intelligence Council, Office of the Director of National Intelligence (ODNI). Andrew Hyde (andrew.hyde@futuregrasp.com) is a Nonresident Fellow at the Stimson Center, a Washington, D.C. based think tank, with the Transforming Conflict and Governance Program, and a Senior Advisor with FutureGrasp. Previously, he was a Foreign Service Officer at the U.S. Department of State and a Congressional staffer. Geoffrey Odlum (geoffrey.odlum@futuregrasp.com) served as a Foreign Service Officer at the U.S. Department of State from 1989 to 2017. He is currently the President of Odlum Global Strategies, a national security and technology policy consulting firm, and is a Senior Advisor with FutureGrasp. Ariel I. Ahram, Ph.D. (ahram@vt.edu) is Associate Professor and Chair of Government and International Affairs (GIA) at the School of Public and International Affairs, Virginia Tech.
By Thomas A. Campbell, Ph.D. 04 Apr, 2019
It is the position of FutureGrasp, LLC that every nation state should have an AI national plan. To remain economically competitive internally, as well as to facilitate collaboration across borders, it is imperative that governments recognize the need to channel whatever resources they possess to produce plans that enable strategic directions in the rapidly growing technology sector of AI.
By Tom Campbell, PhD; Founder & President; FutureGrasp, LLC 07 Sep, 2018
As we move into future years with more advanced compute and exponential data availability, there will be an inexorable push to simulate more of our world. Populating our digital worlds with digital twins and digital doubles will ultimately serve us greatly in our increasingly complex real worlds.
By Tom Campbell, PhD; Founder & President; FutureGrasp, LLC 10 Aug, 2018
It is often said that history repeats itself—we are seeing these same trends today with autonomous weapons.
By Tom Campbell, PhD, Founder & President, FutureGrasp, LLC 20 Jul, 2018
We offer here a review of several facets of artificial intelligence (AI) that could benefit or compromise law enforcement (LE). As AI becomes more advanced, the potential for its application by LE experts (and criminals) will only increase. Let us hope that LE officers in the field and analysts in the office are able to leverage AI to protect people around the world.
By Thomas A. Campbell, Ph.D., FutureGrasp, LLC 04 May, 2018
Quantum supremacy, the point at which a quantum computer exceeds the capabilities of any classical computer , is now near at hand. I explain why and what that means for computer science.
By Thomas A. Campbell, Ph.D., FutureGrasp, LLC & Robert Meagley, Ph.D., nR, LLC; ONE Nanotechnologies LLC 08 Feb, 2018
We review the state of the art in advanced semiconductors as relevant to artificial intelligence (AI).
By Thomas A. Campbell, PhD, FutureGrasp, LLC & Robert Meagley, PhD, nR, LLC 02 Feb, 2018
We review the state of the art in advanced compute architectures as relevant to artificial intelligence (AI).
By Thomas A. Campbell, Ph.D., FutureGrasp, LLC & Matt Wolfe, Ph.D., Virginia Tech Applied Research Corporation (VT-ARC) 09 Jan, 2018
We review the challenges to qualitative forecasting of science and technology, and detail the advantages to taking a more quantitative approach using big data analytics and AI.
By Tom Campbell 04 Dec, 2017
The only way we can survive and thrive in this constantly changing and increasingly complex world is to get assistance from our machine creations. Artificial intelligence (AI) must be a core aspect of that help, as it offers unprecedented capabilities to monitor, control and assess a wide range of situations. AI will not (at least in the near term) remove the human from the loop; instead, it will augment our capacities for better data collection, analysis and decision making.
Show More
Share by: