Google Books
Screenshot | |
Type of site | Digital library |
---|---|
Owner | |
Website |
books |
Launched | October 2004 (as Google Book Search) |
Current status | Active |
Google Books (previously known as Google Book Search and Google Print) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical character recognition (OCR), and stored in its digital database.[1] Books are provided either by publishers and authors, through the Google Books Partner Program, or by Google's library partners, through the Library Project.[2] Additionally, Google has partnered with a number of magazine publishers to digitize their archives.[3][4]
The Publisher Program was first known as 'Google Print' when it was introduced at the Frankfurt Book Fair in October 2004. The Google Books Library Project, which scans works in the collections of library partners and adds them to the digital inventory, was announced in December 2004.
The Google Books initiative has been hailed for its potential to offer unprecedented access to what may become the largest online body of human knowledge[5][6] and promoting the democratization of knowledge.[7] But it has also been criticized for potential copyright violations,[7][8] and lack of editing to correct the many errors introduced into the scanned texts by the OCR process.
As of October 2015, the number of scanned book titles was over 25 million, but the scanning process has slowed down in American academic libraries.[9][10] Google estimated in 2010 that there were about 130 million distinct titles in the world,[11][12] and stated that it intended to scan all of them.[11]
Details
Results from Google Books show up in both the universal Google Search as well as in the dedicated Google Books search website (books.google.com).
In response to search queries, Google Books allows users to view full pages from books in which the search terms appear, if the book is out of copyright or if the copyright owner has given permission. If Google believes the book is still under copyright, a user sees "snippets" of text around the queried search terms. All instances of the search terms in the book text appear with a yellow highlight.
The four access levels used on Google Books are:[13]
- Full view: Books in the public domain are available for "full view" and can be downloaded for free. In-print books acquired through the Partner Program are also available for full view if the publisher has given permission, although this is rare.
- Preview: For in-print books where permission has been granted, the number of viewable pages is limited to a "preview" set by a variety of access restrictions and security measures, some based on user-tracking. Usually, the publisher can set the percentage of the book available for preview.[14] Users are restricted from copying, downloading or printing book previews. A watermark reading "Copyrighted material" appears at the bottom of pages. All books acquired through the Partner Program are available for preview.
- Snippet view: A 'snippet view' – two to three lines of text surrounding the queried search term – is displayed in cases where Google does not have permission of the copyright owner to display a preview. This could be because Google cannot identify the owner or the owner declined permission. If a search term appears many times in a book, Google displays no more than three snippets, thus preventing the user from viewing too much of the book. Also, Google does not display any snippets for certain reference books, such as dictionaries, where the display of even snippets can harm the market for the work. Google maintains that no permission is required under copyright law to display the snippet view.[15]
- No preview: Google also displays search results for books that have not been digitized. As these books have not been scanned, their text is not searchable and only the metadata information such as the title, author, publisher, number of pages, ISBN, subject and copyright information, and in some cases, a table of contents and book summary is available. In effect, this is similar to an online library card catalog.[2]
In response to criticism from groups such as the American Association of Publishers and the Authors Guild, Google announced an opt-out policy in August 2005, through which copyright owners could provide a list of titles that it did not want scanned, and Google would respect the request. Google also stated that it would not scan any in-copyright books between August and 1 November 2005, to provide the owners with the opportunity to decide which books to exclude from the Project. Thus, Google provides a copyright owner with three choices with respect to any work:[15]
- It can participate in the Partner Program to make a book available for preview or full view, in which case it would share revenue derived from the display of pages from the work in response to user queries.
- It can let Google scan the book under the Library Project and display snippets in response to user queries.
- It can opt out of the Library Project, in which case Google will not scan the book. If the book has already been scanned, Google will reset its access level as 'No preview'.
Each book on Google Books has an associated "About this book" page which displays analytical information regarding the book such as a word map of the most used words and phrases, a selection of pages, list of related books, list of scholarly articles and other books that cite the book, and tables of content.[lower-alpha 1] This information is collated through automated methods, and sometimes data from third-party sources is used.[16] This information provides an insight into the book, particularly useful when only a snippet view is available. The list of related books can often contain irrelevant entries.[17] In some cases, a book summary and information about the author is also displayed. The page also displays bibliographic information, which can be exported as citations in BibTeX, EndNote and RefMan formats. Registered users logged in with their Google accounts can post reviews for books on this page. Google Books also displays reviews from Goodreads alongside these reviews.[17]
Most scanned works are no longer in print or commercially available.[18] For those which are, the site provides links to the website of the publisher and booksellers.
Linking
Google Books can retrieve scanned books from URLs based on the International Standard Book Numbers (ISBN), Library of Congress Control Numbers (LCCN) and Online Computer Library Center (OCLC) record numbers. For example, the 'About this book' page of a book with the ISBN 0521349931 can be linked as books.google.com/books?vid=ISBN0521349931[19]
For some books, Google also provides the ability to link directly to the front cover, title page, copyright page, table of contents, index, and back cover of a book, by using an appropriate parameter. For example, the front cover of a book with the OCLC number 17546826 can be linked as books.google.com/books?vid=OCLC17546826&printsec=frontcover[19]
Page numbering
For many books, Google Books displays the original page numbers, resulting in many benefits such as enabling writers to prepare citations with page numbers without needing to have a print copy of the book. However, Tim Parks, writing in The New York Review Blogs, noted that Google had stopped providing page numbers for many recent publications, "presumably in alliance with the publishers, in order to force those of us who need to prepare footnotes to buy paper editions."[20]
Scanning of books
Many of the books are scanned using a customized Elphel 323 camera[21][22] at a rate of 1,000 pages per hour.[23] A patent awarded to Google in 2009 revealed that Google had come up with an innovative system for scanning books that uses two cameras and infrared light to automatically correct for the curvature of pages in a book. By constructing a 3D model of each page and then "de-warping" it, Google is able to present flat-looking pages without having to really make the pages flat, which requires the use of destructive methods such as unbinding or glass plates to individually flatten each page, which is inefficient for large scale scanning.[24][25]
N-gram Viewer
N-gram is a service connected to Google Books, it graphs the frequency of word usage across their book collection. This service is important for historians and linguists as it can provide an inside look into human culture through word use throughout time periods.[26] This program has fallen under criticism because of metadata errors used in the program.[27]
Errors
Google allows users to report errors in books at the website support.google.com/books/partner/troubleshooter/2983879.
Errors in content
The scanning process is subject to errors. For example, some pages may be unreadable, upside down, or in the wrong order. Scholars have even reported crumpled pages, obscuring thumbs and fingers, and smeared or blurry images.[28] On this issue, a declaration from Google at the end of scanned books says:
“ | The digitization at the most basic level is based on page images of the physical books. To make this book available as an ePub formated file we have taken those page images and extracted the text using Optical Character Recognition (or OCR for short) technology. The extraction of text from page images is a difficult engineering task. Smudges on the physical books' pages, fancy fonts, old fonts, torn pages, etc. can all lead to errors in the extracted text. Imperfect OCR is only the first challenge in the ultimate goal of moving from collections of page images to extracted-text based books. Our computer algorithms also have to automatically determine the structure of the book (what are the headers and footers, where images are placed, whether text is verse or prose, and so forth).
Getting this right allows us to render the book in a way that follows the format of the original book. Despite our best efforts you may see spelling mistakes, garbage characters, extraneous images, or missing pages in this book. Based on our estimates, these errors should not prevent you from enjoying the content of the book. The technical challenges of automatically constructing a perfect book are daunting, but we continue to make enhancements to our OCR and book structure extraction technologies.[29] |
” |
As of 2009 Google stated that they would start using ReCAPTCHA to help fix the errors found in Google Book scanning’s. This method would only improve scanned words that are hard to recognize because of the scanning process and cannot solve errors such as turned pages or blocked words.[30]
Errors in metadata
Scholars have frequently reported rampant errors in the metadata information on Google Books – including misattributed authors and erroneous dates of publication. Geoffrey Nunberg, a linguist researching on the changes in word usage over time noticed that a search for books published before 1950 and containing the word "internet" turned up an unlikely 527 results. Woody Allen is mentioned in 325 books ostensibly published before he was born. Google responded to Nunberg by blaming the bulk of errors on the outside contractors.[27]
Timeline
2002: A group of team members at Google officially launch the “secret ‘books’ project.”[31] Google founders Sergey Brin and Larry Page came up with the idea that later became Google Books while still graduate students at Stanford in 1996. The history page on the Google Books website describes their initial vision for this project: “in a future world in which vast collections of books are digitized, people would use a ‘web crawler’ to index the books’ content and analyze the connections between them, determining any given book’s relevance and usefulness by tracking the number and quality of citations from other books.”[31] This team visited the sites of some of the larger digitization efforts at that time including the Library of Congress’s American Memory Project, Project Gutenberg, and the Universal Library to find out how they work, as well as the University of Michigan, Page’s alma mater, and the base for such digitization projects as JSTOR and Making of America. In a conversation with the at that time University President Mary Sue Coleman, when Page found out that the University’s current estimate for scanning all the library’s volumes was 1,000 years, Page reportedly told Coleman that he “believes Google can help make it happen in six."[31]
2003: The team works to develop a high-speed scanning process as well as software for resolving issues in odd type sizes, unusual fonts, and "other unexpected peculiarities."[31]
December 2004: Google signaled an extension to its Google Print initiative known as the Google Print Library Project.[32] Google announced partnerships with several high-profile university and public libraries, including the University of Michigan, Harvard (Harvard University Library), Stanford (Green Library), Oxford (Bodleian Library), and the New York Public Library. According to press releases and university librarians, Google planned to digitize and make available through its Google Books service approximately 15 million volumes within a decade. The announcement soon triggered controversy, as publisher and author associations challenged Google's plans to digitize, not just books in the public domain, but also titles still under copyright.
September–October 2005: Two lawsuits against Google charge that the company has not respected copyrights and has failed to properly compensate authors and publishers. One is a class action suit on behalf of authors (Authors Guild v. Google, Sept. 20 2005) and the other is a civil lawsuit brought by five large publishers and the Association of American Publishers. (McGraw Hill v. Google, Oct. 19 2005)[8][33][34][35][36][37]
November 2005: Google changed the name of this service from Google Print to Google Book Search.[38] Its program enabling publishers and authors to include their books in the service was renamed Google Books Partner Program,[39] and the partnership with libraries became Google Books Library Project.
2006: Google added a "download a pdf" button to all its out-of-copyright, public domain books. It also added a new browsing interface along with new "About this Book" pages.[31]
August 2006: The University of California System announced that it would join the Books digitization project. This includes a portion of the 34 million volumes within the approximately 100 libraries managed by the System.[40]
September 2006: The Complutense University of Madrid became the first Spanish-language library to join the Google Books Library Project.[41]
October 2006: The University of Wisconsin–Madison announced that it would join the Book Search digitization project along with the Wisconsin Historical Society Library. Combined, the libraries have 7.2 million holdings.[42]
November 2006: The University of Virginia joined the project. Its libraries contain more than five million volumes and more than 17 million manuscripts, rare books and archives.[43]
January 2007: The University of Texas at Austin announced that it would join the Book Search digitization project. At least one million volumes would be digitized from the university's 13 library locations.
March 2007: The Bavarian State Library announced a partnership with Google to scan more than a million public domain and out-of-print works in German as well as English, French, Italian, Latin, and Spanish.[44]
May 2007: A book digitizing project partnership was announced jointly by Google and the Cantonal and University Library of Lausanne.[45]
May 2007: The Boekentoren Library of Ghent University announced that it would participate with Google in digitizing and making digitized versions of 19th century books in the French and Dutch languages available online.[46]
June 2007: The Committee on Institutional Cooperation (rebranded as the Big Ten Academic Alliance in 2016) announced that its twelve member libraries would participate in scanning 10 million books over the course of the next six years.[47]
July 2007: Keio University became Google's first library partner in Japan with the announcement that they would digitize at least 120,000 public domain books.[48]
August 2007: Google announced that it would digitize up to 500,000 both copyrighted and public domain items from Cornell University Library. Google would also provide a digital copy of all works scanned to be incorporated into the university's own library system.[49]
September 2007: Google added a feature that allows users to share snippets of books that are in the public domain. The snippets may appear exactly as they do in the scan of the book, or as plain text.[50]
September 2007: Google debuted a new feature called "My Library" which allows users to create personal customized libraries, selections of books that they can label, review, rate, or full-text search.[51]
December 2007: Columbia University was added as a partner in digitizing public domain works.[52]
May 2008: Microsoft tapered off and planned to end its scanning project, which had reached 750,000 books and 80 million journal articles.[53]
October 2008: A settlement was reached between the publishing industry and Google after two years of negotiation. Google agreed to compensate authors and publishers in exchange for the right to make millions of books available to the public.[8][54]
November 2008: Google reached the 7 million book mark for items scanned by Google and by their publishing partners. 1 million were in full preview mode and 1 millionwere fully viewable and downloadable public domain works. About five million were out of print.[18][55][56]
December 2008: Google announced the inclusion of magazines in Google Books. Titles include New York Magazine, Ebony, and Popular Mechanics[57][58]
February 2009: Google launched a mobile version of Google Book Search, allowing iPhone and Android phone users to read over 1.5 million public domain works in the US (and over 500,000 outside the US) using a mobile browser. Instead of page images, the plain text of the book is displayed.[59]
May 2009: At the annual BookExpo convention in New York, Google signaled its intent to introduce a program that would enable publishers to sell digital versions of their newest books direct to consumers through Google.[60]
December 2009: A French court shut down the scanning of copyrighted books published in France, saying this violated copyright laws. It was the first major legal loss for the scanning project.[61]
April 2010: Visual artists were not included in the previous lawsuit and settlement, are the plaintiff groups in another lawsuit, and say they intend to bring more than just Google Books under scrutiny. "The new class action," read the statement, "goes beyond Google's Library Project, and includes Google's other systematic and pervasive infringements of the rights of photographers, illustrators and other visual artists."[62]
May 2010: It was reported that Google would launch a digital book store called Google Editions.[63] It would compete with Amazon, Barnes & Noble, Apple and other electronic book retailers with its own e-book store. Unlike others, Google Editions would be completely online and would not require a specific device (such as kindle, Nook, or iPad).
June 2010: Google passed 12 million books scanned.[11]
August 2010: It was announced that Google intends to scan all known existing 129,864,880 books within a decade, amounting to over 4 billion digital pages and 2 trillion words in total.[11]
December 2010: Google eBooks (Google Editions) was launched in the US.[64]
December 2010: Google launched the N-gram Viewer, which collects and graphs data on word usage across its book collection.[26]
March 2011: A federal judge rejected the settlement reached between the publishing industry and Google.[65]
March 2012: Google passed 20 million books scanned.[66][67]
March 2012: Google reached a settlement with publishers.[68]
January 2013: The documentary Google and the World Brain was shown at the Sundance Film Festival.[69]
November 2013: Ruling in Authors Guild v. Google, US District Judge Denny Chin sides with Google, citing fair use.[70] The authors said they would appeal.[71]
October 2015: The appeals court sided with Google, declaring that Google did not violate copyright law.[72] According to the New York Times, Google has scanned more than 25 million books.[9]
April 2016: The US Supreme Court declined to hear the Authors Guild's appeal, which means the lower court's decision stood, and Google would be allowed to scan library books and display snippets in search results without violating the law.[73]
Google Books Partner Program
The Partner Program is an online book-marketing program designed to help publishers and authors promote their books. Publishers and authors submit either a digital copy of their book in EPUB or PDF format, or a print copy to Google, which is made available on Google Books for preview. The publisher can control the percentage of the book available for preview, with the minimum being 20%. They can also choose to make the book fully viewable, and even allow users to download a PDF copy.
Books can also be made available for sale on Google Play.[2]
Google Books Library Project
The Google Books Library Project is an effort by Google to scan and make searchable the collections of several major research libraries.[74] Along with bibliographic information, snippets of text from a book are often viewable. If a book is out of copyright and in the public domain, the book is fully available to read or download.[13]
In-copyright books scanned through the Library Project are made available on Google Books for snippet view. Regarding the quality of scans, Google acknowledges that they are "not always of sufficiently high quality" to be offered for sale on Google Play. Also, because of supposed technical constraints, Google does not replace scans with higher quality versions that may be provided by the publishers.[19]
The project is the subject of the Authors Guild v. Google lawsuit, filed in 2005 and ruled in favor of Google in 2013, and again, on appeal, in 2015.
Copyright owners can claim the rights for a scanned book and make it available for preview or full view (by "transferring" it to their Partner Program account), or request Google to prevent the book text from being searched.[19]
The number of institutions participating in the Library Project has grown since its inception.[32] The University of Mysore has been mentioned in many media reports as being a library partner,[75][76] although it is not listed as a partner by Google.[77]
Initial partners
- The Harvard University Library and Google conducted a pilot throughout 2005. The project continued, with the aim of increasing online access to the holdings of the Harvard University Library, which includes more than 15.8 million volumes. While physical access to Harvard's library materials is generally restricted to current Harvard students, faculty, and researchers, or to scholars who can come to Cambridge, the Harvard-Google Project has been designed to enable both members of the Harvard community and users everywhere to discover works in the Harvard collection.
- In this pilot program, NYPL is working with Google to offer a collection of its public domain books, which will be scanned in their entirety and made available for free to the public online. Users will be able to search and browse the full text of these works. When the scanning process is complete, the books may be accessed from both The New York Public Library's website and from the Google search engine.[81]
- University of Oxford, Bodleian Library[82]
- Stanford University, Stanford University Libraries (SULAIR)[83]
Additional partners
Other institutional partners have joined the Project since the partnership was first announced:
- Austrian National Library[84]
- Bavarian State Library[85]
- Bibliothèque municipale de Lyon[86]
- Big Ten Academic Alliance[47]
- Columbia University, Columbia University Library System[87]
- Complutense University of Madrid[85][88]
- Cornell University, Cornell University Library[89]
- Ghent University, Ghent University Library/Boekentoren[85][90]
- Keio University, Keio Media Centers (Libraries)[91]
- National Library of Catalonia, Biblioteca de Catalunya[92]
- Princeton University, Princeton University Library[93]
- University of California, California Digital Library[94]
- University of Lausanne, Cantonal and University Library of Lausanne[85][95]
- University of Mysore, Mysore University Library[96]
- University of Texas at Austin, University of Texas Libraries[97]
- University of Virginia, University of Virginia Library[98]
- University of Wisconsin–Madison, University of Wisconsin Libraries[99]
My Library
Google Books allows signed-in users to create a personalized collection or a library of books. Organized through "bookshelves", books can be added to the library using a button that appears along with search results or from the "Overview" page of books. The library can be shared with friends by making bookshelves publicly visible and sharing the private library URL. Users can also import a list of books to the library using their ISBN or ISSN numbers. There are four default bookshelves which cannot be renamed: "Favorites", "Reading now", "To read" and "Have read".[100][101] The library also has default bookshelves ("Purchased", "Reviewed", "My Books on Google Play", "Recently viewed", "Browsing history", and "Books for you") to which books get added automatically. Users cannot add or remove books from these bookshelves.
Criticism
Copyright infringement, fair use and related issues
Through the project, library books were being digitized somewhat indiscriminately regardless of copyright status, which led to a number of lawsuits against Google. By the end of 2008, Google had reportedly digitized over seven million books, of which only about one million were works in the public domain. Of the rest, one million were in copyright and in print, and five million were in copyright but out of print. In 2005, a group of authors and publishers brought a major class-action lawsuit against Google for infringement on the copyrighted works. Google argued that it was preserving "orphaned works" – books still under copyright, but whose copyright holders could not be located.[102]
The Authors Guild and Association of American Publishers separately sued Google in 2005 for its book project, citing "massive copyright infringement." Google countered that its project represented a fair use and is the digital age equivalent of a card catalog with every word in the publication indexed.[8] The lawsuits were consolidated, and eventually a settlement was proposed. The settlement received significant criticism on a wide variety of grounds, including antitrust, privacy, and inadequacy of the proposed classes of authors and publishers. The settlement was eventually rejected,[103] and the publishers settled with Google soon after. The Authors Guild continued its case, and in 2011 their proposed class was certified. Google appealed that decision, with a number of amici asserting the inadequacy of the class, and the Second Circuit rejected the class certification in July 2013, remanding the case to the District Court for consideration of Google's fair use defense.[104]
In 2015 Authors Guild filed another appeal against Google to be considered by the 2nd U.S. Circuit Court of Appeals in New York. Google won the case unanimously based on the argument that they were not showing people the full texts but instead snippets, and they are not allowing people to illegally read the book.[105] In a report, courts stated that they did not infringe on copyright laws, as they were protected under the fair use clause.[106]
Authors Guild tried again in 2016 to appeal the decision and this time took their case to be considered by the Supreme Court. The case was rejected, leaving the Second Circuit's decision on the case intact, meaning that Google did not violate copyright laws.[107] This case also set a precedent for other case similar in regards to fair use laws as it further clarified the law and expands it. Such clarification is important in the new digital age as it affects other scanning projects similar to Google.[105]
Other lawsuits followed the Authors Guild's lead. In 2006 a German lawsuit, previously filed, was withdrawn.[108] In June 2006, Hervé de la Martinière,[109] a French publisher known as La Martinière and Éditions du Seuil,[110] announced its intention to sue Google France.[111] In 2009, the Paris Civil Court awarded 300,000 EUR (approximately 430,000 USD) in damages and interest and ordered Google to pay 10,000 EUR a day until it removes the publisher's books from its database.[110][112] The court wrote, "Google violated author copyright laws by fully reproducing and making accessible" books that Seuil owns without its permission[110] and that Google "committed acts of breach of copyright, which are of harm to the publishers".[109] Google said it will appeal.[110] Syndicat National de l'Edition, which joined the lawsuit, said Google has scanned about 100,000 French works under copyright.[110]
In December 2009, Chinese author Mian Mian filed a civil lawsuit for $8,900 against Google for scanning her novel, Acid Lovers. This is the first such lawsuit to be filed against Google in China.[113] Also, in November that year, the China Written Works Copyright Society (CWWCS) accused Google of scanning 18,000 books by 570 Chinese writers without authorization. Google agreed on Nov 20 to provide a list of Chinese books it had scanned, but the company refused to admit having "infringed" copyright laws.[114]
In March 2007, Thomas Rubin, associate general counsel for copyright, trademark, and trade secrets at Microsoft, accused Google of violating copyright law with their book search service. Rubin specifically criticized Google's policy of freely copying any work until notified by the copyright holder to stop.[115]
Google licensing of public domain works is also an area of concern due to using of digital watermarking techniques with the books. Some published works that are in the public domain, such as all works created by the U.S. Federal government, are still treated like other works under copyright, and therefore locked after 1922.[116]
Academic criticism
For many scholars, a far more egregious problem with the project stems from the fact that it does not seem to be meeting its fundamental state goal of "preserving" orphaned and out-of-print works. Google has apparently been passing huge numbers of scanned and electronic-text books into circulation without editing the texts for errors introduced by the digitizing processes. This problem has been apparent for a number of years,[117][118][119] but became obvious in a big way in 2014, when Google formed a partnership with bookseller Barnes & Noble,[120] through which Google made more than a half a million public domain texts available to Barnes & Noble, to be offered for free on the Nook Shop for their e-readers.
Customers downloading these books discovered that as many as 80% of them were essentially unreadable, riddled with a large number of errors introduced by either the scanning process itself, or the conversion of the scans into electronic texts via Optical character recognition (OCR) software. Google apparently scanned the texts, but did not trouble to edit them for errors, and Barnes & Noble compounded the problem by not exercising any quality control over the Google texts, simply offering them unexamined in their shop to customers. The effect of the scan and OCR errors is to render the contents of many books essentially unreadable, a problem especially in scientific works, where incorrectly rendered, missing, or extraneous characters in scientific equations may render them meaningless.[121][122]
Also of concern are the large numbers of metadata errors in the Google collection. Metadata refers to the information which identifies a particular text: title, author, publisher, publication place and date, subject classification, etc. – essentially the information which would be found in a library card catalog.
One investigator found thousands of such errors in the samples he took, including publication dates that predated the birth of the books' author (e.g., 182 works by Charles Dickens supposedly published prior to his birth in 1812), wildly inappropriate subject classifications (an edition of Moby Dick found under "computers", a biography of Mae West classified under "religion"), conflicting classifications (10 editions of Whitman's Leaves of Grass all classified as both "Fiction" and "Nonfiction"), incorrectly spelled titles, authors, and publishers (Moby Dick : or the White "Wall"), metadata for one book incorrectly appended to a completely different book (the metadata for an 1818 mathematical work leads to a 1963 romance novel), books about the Internet with publication dated before the Internet existed, and many more.[123][124]
- "A review of the author, title, publisher, and publication year metadata elements for 400 randomly selected Google Books records was undertaken. The results show 36% of sampled books in the digitization project contained metadata errors. This error rate is higher than one would expect to find in a typical library online catalog.".[125]
- "The overall error rate of 36.75% found in this study suggests that Google Books’ metadata has a high rate of error. While “major” and “minor” errors are a subjective distinction based on the somewhat indeterminate concept of “findability”, the errors found in the four metadata elements examined in this study should all be considered major."[126]
Such metadata errors can make doing serious research using the Google Books Project database difficult if not impossible – even assuming all of the scanned texts were edited and error corrected. To date, Google has shown only limited interest in cleaning up these errors.[127]
Language issues
Some European politicians and intellectuals have criticized Google's effort on linguistic imperialism grounds. They argue that because the vast majority of books proposed to be scanned are in English, it will result in disproportionate representation of natural languages in the digital world. German, Russian, French, and Spanish, for instance, are popular languages in scholarship. The disproportionate online emphasis on English, however, could shape access to historical scholarship, and, ultimately, the growth and direction of future scholarship. Among these critics is Jean-Noël Jeanneney, the former president of the Bibliothèque nationale de France.[128]
Google Books versus Google Scholar
While Google Books has digitized large numbers of journal back issues, its scans do not include the metadata required for identifying specific articles in specific issues. This has led the makers of Google Scholar to start their own program to digitize and host older journal articles (in agreement with their publishers).[129]
Similar projects
- Project Gutenberg is a volunteer effort to digitize and archive cultural works, to "encourage the creation and distribution of eBooks". It was founded in 1971 by Michael S. Hart and is the oldest digital library. As of 3 October 2015, Project Gutenberg reached 50,000 items in its collection.
- Internet Archive is a non-profit which digitizes over 1000 books a day, as well as mirrors books from Google Books and other sources. As of May 2011, it hosted over 2.8 million public domain books, greater than the approximate 1 million public domain books at Google Books.[130] Open Library, a sister project of Internet Archive, lends 80,000 scanned and purchased commercial ebooks to the visitors of 150 libraries.[131]
- HathiTrust maintains HathiTrust Digital Library since 13 October 2008,[132] which preserves and provides access to material scanned by Google, some of the Internet Archive books, and some scanned locally by partner institutions. As of May 2010, it includes about 6 million volumes, over 1 million of which are public domain (at least in the US).
- Microsoft funded the scanning of 300,000 books to create Live Search Books in late 2006. It ran until May 2008, when the project was abandoned[133] and the books were made freely available on the Internet Archive.[134]
- Europeana links to roughly 10 million digital objects as of 2010, including video, photos, paintings, audio, maps, manuscripts, printed books, and newspapers from the past 2,000 years of European history from over 1,000 archives in the European Union.[135][136]
- Gallica from the French National Library links to about 800,000 digitized books, newspapers, manuscripts, maps and drawings, etc. Created in 1997, the digital library continues to expand at a rate of about 5000 new documents per month. Since the end of 2008, most of the new scanned documents are available in image and text formats. Most of these documents are written in French.
- Wikisource
- Runivers
See also
- A9.com, Amazon.com's book search
- Book Rights Registry
- Digital library
- List of digital library projects
- Universal library
Notes
- ↑ A sample "About this book" page of a book can be viewed at books
.google .com /books?uid=104498140124963532048
References
- ↑ The basic Google book link is found at https://books.google.com/. The "advanced" interface allowing more specific searches is found at https://books.google.com/advanced_book_search
- 1 2 3 "Where do these books come from?". Google Books Help. Google. Retrieved 10 November 2014.
- ↑ Mark O'Neill (28 January 2009). "Read Complete Magazines Online in Google Books". Make Use Of.
- ↑ "About Magazines search". Google Books Help. Google. Retrieved 13 January 2015.
- ↑ Bergquist, Kevin (2006-02-13). "Google project promotes public good". The University Record. University of Michigan. Retrieved 2007-04-11.
- ↑ Pace, Andrew K. (January 2006). "Is This the Renaissance or the Dark Ages?". American Libraries. American Library Association. Archived from the original on 2007-04-03. Retrieved 2007-04-11.
Google made instant e-book believers out of skeptics even though 10 years of e-book evangelism among librarians had barely made progress.
- 1 2 Malte Herwig, "Google's Total Library", Spiegel Online International, Mar. 28, 2007.
- 1 2 3 4 Copyright infringement suits against Google and their settlement: "Copyright Accord Would Make Millions More Books Available Online". Google Press Center. Retrieved November 22, 2008.
- 1 2 http://www.nytimes.com/2015/10/29/arts/international/google-books-a-complex-and-controversial-experiment.html?_r=0
- ↑ http://www.newyorker.com/business/currency/what-ever-happened-to-google-books
- 1 2 3 4 Google: 129 Million Different Books Have Been Published PC World
- ↑ "Books of the world". Google. August 5, 2010. Retrieved 2010-08-15.
After we exclude serials, we can finally count all the books in the world. There are 129,864,880 of them. At least until Sunday
- 1 2 Google Books Library Project – An enhanced card catalog of the world's books. Google. Retrieved 26 January 2015.
- ↑ Greg Duffy (March 2005). "Google's Cookie and Hacking Google Print". Kuro5hin.
- 1 2 Band, Jonathan. "The Google Library Project: Both Sides of the Story".
- ↑ "Where do you get the information for the 'About this book' page?". Google Books Help. Google. Retrieved 14 November 2014.
- 1 2 Laura Miller (8 December 2010). "Is Google leading an e-book revolution?". Salon.
Google has incorporated reader reviews from the social networking service GoodReads, which helps, as these are often more thoughtful than the average Amazon reader review, but the "related books" suggestion lists still have some kinks to iron out — fans of Rebecca Skloot’s "The Immortal Life of Henrietta Lacks" are referred to a trashy novel titled "Bling Addiction," for example
- 1 2 Perez, Juan Carlos (October 28, 2008). "In Google Book Settlement, Business Trumps Ideals". PC World. Retrieved 2013-08-27.
Of the seven million books Google has scanned, one million are in full preview mode as part of formal publisher agreements. Another one million are public domain works.
- 1 2 3 4 "Books Help". Google. Retrieved 26 January 2015.
- ↑ Tim Parks (13 September 2014). "References, Please". The New York Review.
- ↑ Almaer, Dion. "Weekly Google Code Roundup for August 10th". Google Code. Retrieved 27 August 2013.
- ↑ "Resume of Ted Merrill, Software Engineer". Archived from the original on July 31, 2008. Retrieved 27 August 2013.
Adapted firmware of Elphel 323 camera to meet needs of Google Book Search
- ↑ Kelly, Kevin (May 14, 2006). "Scan This Book!". New York Times Magazine. Retrieved 2008-03-07.
When Google announced in December 2004 that it would digitally scan the books of five major research libraries to make their contents searchable, the promise of a universal library was resurrected. ... From the days of Sumerian clay tablets till now, humans have "published" at least 32 million books, 750 million articles and essays, 25 million songs, 500 million images, 500,000 movies, 3 million videos, TV shows and short films and 100 billion public Web pages.
- ↑ Stephen Shankland (4 May 2009). "Patent reveals Google's book-scanning advantage". CNET.
- ↑ Maureen Clements (30 April 2009). "The Secret Of Google's Book Scanning Machine Revealed". NPR.org.
- 1 2 Zimmer, Ben. "Bigger, Better Google Ngrams: Brace Yourself for the Power of Grammar". Retrieved 2016-09-20.
- 1 2 Laura Miller (9 September 2010). "The trouble with Google Books". Salon.
- ↑ Laura Miller (8 December 2010). "Is Google leading an e-book revolution?". Salon.
- ↑ Great Expections by Charles Dickens on Google Books reader. Google.
- ↑ "Google Acquisition Will Help Correct Errors in Scanned Works". Retrieved 2016-09-20.
- 1 2 3 4 5 "Google Books History – Google Books". books.google.com. Retrieved 2016-02-22.
- 1 2 O'Sullivan, Joseph and Adam Smith. "All booked up," Googleblog. December 14, 2004.
- ↑ "Authors Guild v. Google Settlement Resources Page". Authors Guild. Retrieved November 22, 2008.
- ↑ "A new chapter". The Economist. October 30, 2008. Retrieved November 22, 2008.
- ↑ Aiken, Paul (2005-09-20). "Authors Guild Sues Google, Citing "Massive Copyright Infringement"". Authors Guild. Archived from the original on 2007-02-09. Retrieved 2007-04-11.
- ↑ Gilbert, Alorie (2005-10-19). "Publishers sue Google over book search project". CNET News. Retrieved 2007-04-11.
- ↑ "The McGraw Hill Companies, Inc.; Pearson Education, Inc.; Penguin Group (USA) Inc.; Simon and Schuster, Inc.; John Wiley and Sons, Inc. Plaintiffs, v. Google Inc., Defendant" (PDF). Retrieved 2007-10-05. PDF file of the complaint. SD. N.Y. Case No. 05-CV-8881-JES.
- ↑ Jen Grant (November 17, 2005). "Judging Book Search by its cover" (blog). Googleblog.
- ↑ "Library partners". Google books. Retrieved 2013-02-27.
- ↑ Colvin, Jennifer. "UC libraries partner with Google to digitize books". University of California. Retrieved 27 August 2013.
- ↑ "University Complutense of Madrid and Google to Make Hundreds of Thousands of Books Available Online". Google. Retrieved 28 August 2013.
- ↑ "New release: UW-Madison Joins Google's Worldwide Book Digitization Project". University of Wisconsin-Madison. Retrieved 28 August 2013.
- ↑ "The University of Virginia Library Joins the Google Books Library Project". Google. Retrieved 28 August 2013.
- ↑ Mills, Elinor. "Bavarian library joins Google book search project". Cnet. Retrieved 28 August 2013.
- ↑ Reed, Brock. "La Bibliothèque, C'est Google" (Wired Campus Newsletter), Chronicle of Higher Education. May 17, 2007.
- ↑ "Google Project". Universiteitsbibliotheek Gent. Retrieved 28 August 2013.
- 1 2 "Google Book Search Project - Menu". Big Ten Academic Alliance. Retrieved 30 June 2016.
- ↑ DeBonis, Laura. "Keio University Joins Google's Library Project". Google Books Search. Retrieved 28 August 2013.
- ↑ "Cornell University Library becomes newest partner in Google Book Search Library Project". Cornell University Library. Retrieved 28 August 2013.
- ↑ Tungare, Manas. "Share and enjoy". Google Books Search. Retrieved 28 August 2013.
- ↑ Google Books.
- ↑ Stricker, Gabriel. "Columbia University joins the Google Book Search Library Project". Google Books Search. Retrieved 28 August 2013.
- ↑ Helft, Miguel (May 24, 2008). "Microsoft Will Shut Down Book Search Program". New York Times. Retrieved 2008-05-24.
Microsoft said it had digitized 750,000 books and indexed 80 million journal articles.
- ↑ Cohen, Noam (February 1, 2009). "Some Fear Google's Power in Digital Books". New York Times. Retrieved 2009-02-02.
Today, that project is known as Google Book Search and, aided by a recent class-action settlement, it promises to transform the way information is collected: who controls the most books; who gets access to those books; how access will be sold and attained.
- ↑ "Massive EU online library looks to compete with Google". Agence France-Presse. November 2008. Retrieved 2008-11-24.
Google, one of the pioneers in this domain on the other hand, claims to have seven million books available for its "Google Book Search" project, which saw the light of day at the end of 2004.
- ↑ Rich, Motoko (January 4, 2009). "Google Hopes to Open a Trove of Little-Seen Books". New York Times. Retrieved 2009-01-05.
The settlement may give new life to copyrighted out-of-print books in a digital form and allow writers to make money from titles that had been out of commercial circulation for years. Of the seven million books Google has scanned so far, about five million are in this category.
- ↑ "Google updates search index with old magazines". MSNBC. Associated Press. December 10, 2008. Retrieved June 29, 2009.
As part of its quest to corral more content published on paper, Google Inc. has made digital copies of more than 1 million articles from magazines that hit the newsstands decades ago.
- ↑ "Official Google Blog: Search and find magazines on Google Book Search". Official Google Blog.
- ↑ "1.5 million books in your pocket". Inside Google Books. Google. 5 February 2009.
- ↑ Rich, Motoko (2009-06-01). "Preparing to Sell E-Books, Google Takes on Amazon". The New York Times. Retrieved 2009-05-31.
- ↑ Faure, Gaelle (December 19, 2009). "French court shuts down Google Books project". Los Angeles Times. Retrieved 2009-12-19.
- ↑ O'Dell, Jolie. "Google Gets Sued by Photographers Over Google Books". Mashable. Retrieved 28 August 2013.
- ↑ Vascellaro, Jessica E. (4 May 2010). "Google Readies Its E-Book Plan, Bringing in a New Sales Approach". The Wall Street Journal. Retrieved 28 August 2013.
- ↑ "Google launches eBookstore with more than 3 million titles". MacWorld.
- ↑ "Judge rejects Google settlement with authors". Market Watch.
- ↑ "Google book scan project slows down". Law Librarian Blog. Archived from the original on 2012-03-15. Retrieved March 2012. Check date values in:
|access-date=
(help) - ↑ Howard, Jennifer Google Begins to Scale Back Its Scanning of Books From University Libraries, March 9, 2012
- ↑ http://www.publishers.org/press/85/
- ↑ "Google and the world brain - Polar Star Films". Google and the world brain - Polar Star Films.
- ↑ "Google Books ruled legal in massive win for fair use".
- ↑ "Siding With Google, Judge Says Book Search Does Not Infringe Copyright", Claire Cain Miller and Julie Bosman, New York Times, November 14, 2013. Retrieved November 17, 2013.
- ↑ "Google book-scanning project legal, says U.S. appeals court". Reuters.
- ↑ US Supreme Court Rejects Challenge to Google Book-Scanning Project April 18, 2016
- ↑ Stein, Linda L.; Lehu, Peter, J (2009). Literary Research and the American Realism and Naturalism Period: Strategies and Sources. p. 261.
- ↑ "Google to scan 800,000 manuscripts, books from Indian university". Ars Technica.
- ↑ "Google to digitise books at Mysore varsity". Hindustan Times. 20 May 2007.
- ↑ Library Partners.
- ↑ "Harvard-Google Project". Harvard University Library. Retrieved 28 August 2013.
- ↑ "Michigan Digitization Project". MLibrary - University of Michigan. Retrieved 28 August 2013.
- ↑ "Press Releases".
- ↑ New York Public Library + Google
- ↑ "Oxford Google Books Project". Bodleian Libraries, University of Oxford. Retrieved 28 August 2013.
- ↑ "Stanford's Role in Google Books". Stanford University Libraries. Retrieved 28 August 2013.
- ↑ "Austrian Books Online". Austrian National Library. Retrieved 14 January 2015.
- 1 2 3 4 Albanese, Andrew (2007-06-15). "Google Book Search Grows". Library Journal. Retrieved 28 August 2013.
- ↑ "Google partenaire numérique officiel de la bibliothèque de Lyon".
- ↑ "Columbia University Libraries Becomes Newest Partner in Google Book Search Library Project". Columbia University Libraries. 2007-12-13. Retrieved 28 August 2013.
- ↑ "Complutense Universidad + Google" (PDF) (in Spanish).
- ↑ "Cornell University Library becomes newest partner in Google Book Search Library Project". Cornell University Library. Retrieved 28 August 2013.
- ↑ Ghent/Gent + Google
- ↑ "Keio University to partner with Google, Inc. for digitalization and release of its library collection to the world For "Formation of Knowledge of the digital era"" (PDF). Keio University. 2007-07-06. Retrieved 28 August 2013.
- ↑ "Google digitaliza 35 mil libros de la Biblioteca de Catalunya libres de derechos de autor". LA VANGUARDIA.
- ↑ Cliatt, Cass (2007-02-05). "Library joins Google project to make books available online". Princeton University. Retrieved 30 August 2013.
- ↑ "UC libraries partner with Google to digitize books". University of California. 2006-08-09. Retrieved 30 August 2013.
- ↑ Cantonal and University Library of Lausanne/Bibliothèque Cantonale et Universitaire (BCU) + Google (in French)
- ↑ Anderson, Nate (2007-05-22). "Google to scan 800,000 manuscripts, books from Indian university". Ars Technica. Retrieved 30 August 2013.
- ↑ "The University of Texas Libraries Partner with Google to Digitize Books". The University of Texas Libraries. 2007-01-19. Retrieved 30 August 2013.
- ↑ Wood, Carol, S. (2006-11-14). "U.Va. Library Joins the Google Books Library Project". University of Virginia. Retrieved 30 August 2013.
- ↑ "University of Wisconsin-Madison Google Digitization Initiative". University of Wisconsin-Madison. Retrieved 30 August 2013.
- ↑ "My Library". Google. Retrieved 6 November 2014.
- ↑ "My Library FAQ". Google Books Help. Google. Retrieved 6 November 2014.
- ↑ Robert Darnton (February 12, 2009). "Google and the Future of Books". The New York Review of Books.
- ↑ 770 F.Supp.2d 666 (SDNY March 22, 2011).
- ↑ Authors Guild v. Google, 2d Cir. July 1, 2013.
- 1 2 "U.S. Appeals Court Rules Google Book Scanning Is Fair Use". Retrieved 2016-09-20.
- ↑ "Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015)". Retrieved 2016-09-21.
- ↑ "Google Books just won a decade-long copyright fight". Washington Post. Retrieved 2016-09-20.
- ↑ Sullivan, Danny (2006-06-28). "Google Book Search Wins Victory In German Challenge" (blog). Search Engine Watch. Retrieved 2006-11-11.
- 1 2 Sage, Adam (December 19, 2009). "French publishers toast triumph over Google". The Times of London. Retrieved 2009-12-18.
- 1 2 3 4 5 Smith, Heather (December 18, 2009). "Google's French Book Scanning Project Halted by Court". Bloomberg. Retrieved 2009-12-18.
- ↑ Oates, John (June 7, 2006). "French publisher sues Google". The Register.
- ↑ "Fine for Google over French books". BBC News. December 18, 2009. Retrieved 2009-12-18.
- ↑ "Google Faces Chinese Lawsuit Over Digital Book Project".
- ↑ "Writer sues Google for copyright infringement". China Daily. Retrieved March 20, 2012.
- ↑ Thomas Claburn (March 6, 2007). "Microsoft Attorney Accuses Google Of Copyright Violations". InformationWeek.
- ↑ Robert B. Townsend, Google Books: Is It Good for History?, Perspectives (September 2007).
- ↑ Kirk McElhearn (5 May 2014). "Ebooks and Typos: Readers, and Consumers, Deserve Better". Kirkville.
- ↑ Dianne See Morrison (6 February 2009). "paidContent.org - The Plot Thickens For E-Books: Google And Amazon Putting More Titles On Mobile Phones". The Washington Post.
- ↑ "Google Books: How bad is the metadata? Let me count the ways….". Music - Technology - Policy. WordPress. 29 September 2009.
- ↑ Alexandra Alter (7 August 2014). "Google and Barnes & Noble Unite to Take On Amazon". The New York Times.
- ↑ Free Books -- useless?
- ↑ A Hierarchical, HMMbased Accuracy for a Digital Library of Books
- ↑ Major errors prompt questions over Google Book Search's scholarly value
- ↑ Google Books: The Metadata Mess
- ↑ James and Weiss (2012): An Assessment of Google Books’ Metadata, Abstract, p. 1
- ↑ James and Weiss (2012): An Assessment of Google Books’ Metadata, p. 5
- ↑ Geoffrey Nunberg (August 31, 2009). "Google's Book Search: A Disaster for Scholars". The Chronicle of Higher Education.
- ↑ Jean-Noël Jeanneney (2006-10-23). Google and the Myth of Universal Knowledge: A View from Europe (book abstract; Foreword by Ian Wilson). ISBN 0-226-39577-4. Retrieved 2007-02-21.
- ↑ Barbara Quint, "Changes at Google Scholar: A Conversation With Anurag Acharya", Information Today, August 27, 2007.
- ↑ The number of Public Domain books at Google Books can be calculated by looking at the number of Public Domain books at HathiTrust, which is the academic mirror of Google Books. As of May 2010 HathiTrust had over 1 million Public Domain books.
- ↑ "Internet Archive and Library Partners Develop Joint Collection of 80,000+ eBooks To Extend Traditional In-Library Lending Model". San Francisco. February 22, 2011. Retrieved 2011-05-26.
During a library visit, patrons with an OpenLibrary.org account can borrow any of these lendable eBooks using laptops, reading devices or library computers.
- ↑ Languagehat.com
- ↑ "Microsoft starts online library in challenge to Google Books". AFP. Melbourne. 2006-12-08. Retrieved 2008-11-24.
Microsoft launched an online library in a move that pits the world's biggest software company against Google's controversial project to digitize the world's books.
- ↑ Xio, Christina. "Google Books-An Other Popular Service By Google". Retrieved 4 August 2012.
Few years back the Microsoft abandoned the project and now all the books are freely available at the Internet archive.
- ↑ http://version1.europeana.eu/
- ↑ Snyder, Chris (November 20, 2008). "Europe's Answer to Google Book Search Crashes on Day 1". Wired. Retrieved 2008-11-24.
External links
Wikimedia Commons has media related to Google Books. |
- Google Books homepage
- Google Books Information Page
- Google Books Timeline
- Jeffrey Toobin; Google's Moon Shot
- PublicDomainReprints.org – an experiment that prints public domain books from Google Books
- Robert Darnton – Google & the Future of Books