Richard Lea reports for the Guardian that Google has fed its AI program, Google Brain, 11,000 novels in order the improve the quality of its conversation. Unfortunately, Google took those 11,000 novels without notifying their authors (many of them were unpublished and freely available on the web), resulting in something of an ethical conundrum for authors. Read Lea in partial below, in full via the Guardian.
When the writer Rebecca Forster first heard how Google was using her work, it felt like she was trapped in a science fiction novel.
âIs this any different than someone using one of my books to start a fire? I have no idea,â she says. âI have no idea what their objective is. Certainly it is not to bring me readers.â
After a 25-year writing career, during which she has published 29 novels ranging from contemporary romance to police procedurals, the first instalment of her Josie Bates series, Hostile Witness, has found a new reader: Googleâs artificial intelligence.
âMy imagination just didnât go as far as it being used for something like this,â Forster says. âPerhaps thatâs my failure.â
Forsterâs thriller is just one of 11,000 novels that researchers including Oriol Vinyals and Andrew M Dai at Google Brain have been using to improve the technology giantâs conversational style. After feeding these books into a neural network, the system was able to generate fluent, natural-sounding sentences. According to a Google spokesman â who didnât want to be named â products such as the Google app will be âmuch more useful if they can capture the nuance of language betterâ.
For the moment, the research is just a âproof of conceptâ, the spokesman continues via email, but these methods âcould help Google understand and produce a broader, more nuanced range of text for any given taskâ.
âWe could have used many different sets of data for this kind of training, and we have used many different ones for different research projects,â he adds. âBut in this case, it was particularly useful to have language that frequently repeated the same ideas, so the model could learn many ways to say the same thing â the language, phrasing and grammar in fiction books tends to be much more varied and rich than in most nonfiction books.â
The only problem is that they didnât ask. The Google paper [PDF] says that the novels used in this research were taken from âthe Books Corpusâ, citing a 2015 paper by Ryan Kiros and others [PDF] which describes how the authors âcollected a corpus of 11,038 books from the webâ, describing them as âfree books written by [as] yet unpublished authorsâ. Itâs a collection that has been used by other researchers working in artificial intelligence and which is currently available for download in its entirety from the University of Toronto.
Forster says that she âalways appreciates an interesting use of wordsâ, but while Hostile Witness is available to download for free, no one asked her permission to use her novel as raw material to train a computer.
âPerhaps Iâm still thinking in the old way, that a reader will read my book â it didnât even occur to me that a machine could read my book. What I found curious was that these were referred to as âfree books written by as yet unpublished authorsâ because my state is very different,â she says.
Like many of the novels in the Book Corpus collection, the edition of Hostile Witness used in the research was published on Smashwords and includes a copyright declaration that reserves âall rightsâ, specifies that the ebook is âlicensed for your personal enjoyment onlyâ, and offers the reader thanks for ârespecting the hard work of this authorâ. While Forster says sheâs no lawyer, the âspirit of this declaration is clear â you hope that your work would be respected by readersâ.
âI take great pride in my craft, and perhaps it was chosen because of that. Which would be great. Or perhaps it was chosen because it was there, because it was free?â