信息檢索導(dǎo)論

出版時間:2010-1  出版社:人民郵電出版社  作者:(美)曼寧,(美)拉哈萬,(德)舒策 著  頁數(shù):482  字數(shù):605000  
Tag標簽:無  

前言

  As recently as the 1990s, studies showed that most people preferred getting information from other people rather than from information retrieval OR) systems. Of course, in that time period, most people also used human travel agents to book their travel. However, during the last decade, relentless opti- mization of information retrieval effectiveness has driven web search engines to new quality levels at which most people are satisfied most of the time, and web search has become a standard and often preferred source of information finding. For example, the 2004 Pew Internet Survey (Fallows 2004) found that "92% of Internet users say the Internet is a good place to go for getting everyday information." To the surprise of many, the feld of information re- trieval has moved from being a primarily academic discipline to being the basis underlying most peoples preferred means of information access. This book presents the scientific underpinnings of this field, at a level accessible to graduate students as well as advanced undergraduates.  Information retrieval did not begin with the Web. In response to various challenges of providing information access, the field of IR evolved to give principled approaches to searching various forms of content. The field be- gan with scientific publications and library records but soon spread to other forms of content, particularly those of information professionals, such as journalists, lawyers, and doctors. Much of the scientific research on IR has occurred in these contexts, and much of the continued practice of IR deals with providing access to unstructured information in various corporate and governmental domains, and this work forms much of the foundation of our book.

內(nèi)容概要

  本書是信息檢索的教材,旨在從計算機科學的視角提供一種現(xiàn)代的信息檢索方法。書中從基本概念講解網(wǎng)絡(luò)搜索以及文本分類和文本聚類等,對收集、索引和搜索文檔系統(tǒng)的設(shè)計和實現(xiàn)的方方面面、評估系統(tǒng)的方法、機器學習方法在文本收集中的應(yīng)用等給出了最新的講解?! 兴兄匾乃枷攵际怯檬纠M行解釋,圖文并茂。本書非常適合作為計算機科學及相關(guān)專業(yè)的高年級本科生和研究生的“信息檢索”課程的入門教材,當然也同樣適合研究人員和專業(yè)人士閱讀。

作者簡介

  Christopher D.Manning,斯坦福大學語言學博士,現(xiàn)任斯坦福大學計算機科學和語言學副教授,主要研究方向是統(tǒng)計自然語言處理、信息提取與表示、文本理解和文本挖掘等?! rabhakar Raghavan,加州大學伯克利分校博士,現(xiàn)任Yahoo!實驗室主任,斯坦福大學計算機科學系顧問教授,是ACM和IEEE會士。主要研究興趣是文本及Web數(shù)據(jù)挖掘、算法設(shè)計等。此前,他曾任Verity公司CTO,并在舊M研究院擔任過管理工作?! inrich Schuze斯坦福大學博士,現(xiàn)任斯圖加特大學自然語言處理研究所理論計算語言學主任。他在美國硅谷工作過多年,曾在施樂Palo Alto研究中心供職,擔任過Outride公司(后被Google公司收購)副總裁,做過Novation生物科技公司CTO和Enkata公司首席科學家。

書籍目錄

1 Boolean retrieval 2 The term vocabulary and postings lists 3 Dictionaries and tolerant retrieval 4 Index construction 5 Index compression 6 Scoring, term weighting, and the vector space model 7 Computing scores in a complete search system 8 Evaluation in information retrieval 9 Relevance feedback and query expansion 10 XML retrieval 11 Probabilistic information retrieval 12 Language models for information retrieval 13 Text classification and Naive Bayes 14 Vector space classification 15 Support vector machines and machine learning on documents 16 Flat clustering 17 Hierarchical clustering 18 Matrix decompositions and latent semantic indexing 19 Web search basics 20 Web crawling and indexes 21 Link analysis Inde Bibliography 

章節(jié)摘錄

  An example information retrieval problem  A fat book that many people own is Shakespeares Collected Works.Suppose you wanted to determine which plays of Shakespeare contain the words Brutus AND Caesar AND NOT Calpurnia.One way to do that is to start at the beginning and to read through all the text,noting for each play whether it contains Brutus and Caesar and excluding it from consideration if it contains Calpurnia.The simplest form of document retrieval is for a computer to do this sort of linear scan through documents.This process is commonly referred to as grepping through text,after the Unix command g r e p,which performs this process.Grepping through text can be a very effective process, especially given the speed of modem computers,and often allows useful possibilities for wildcard pattern matching through the use of regular expressions.With modem computers.for simple querying of modest collections (the size of Shakespeares Collected Works is a bit under one million words of text in total),you really need nothing more.  But for many purposes,you do need more:  1.To process large document collections quickly.The amount of online data has grown at least as quickly as the speed of computers,and we would now like to be able to search collections that total in the order of biHions to trillions of words.  2.To allow more flexible matching operations.For example,it is impractical to perform the query Romans NEAR countrymen with g r e p,where NEAR might be defined as within 5 words or within the same sentence?  3.To allow ranked retrieval.In many cases,you want the best answer to an information need among many documents that contain certain words. The way to avoid linearly scanning the texts for each query is to index the documents in advance.Let us stick with Shakespeares Collected Works,and use it to introduce the basics of the Boolean retrieval model.Suppose we record foreachdocument—here aplayofShakespeare’s—whetheritcontainseach word out of all the words Shakespeare used(Shakespeare used about 32,000 different words).The result is a binary term—document incidence matrix,as in Figure 1.1.Terms are the indexed units(further discussed in Section 2.2);they are usuany words,and for the moment you can think of them as wordsf but the information retrieval literature normally speaks of terms because some of them,such as perhaps I-9 or Hong Kong are not usuaHy thought of as words.

媒體關(guān)注與評論

  “如何排定SVM、XML、DNS和LSI的順序?什么是信息檢索中的垃圾信息、隱藏頁和門頁?MapReduce和其他一些并行運算方法是如何實現(xiàn)由兆字節(jié)(MB)到百萬兆字節(jié)(PB)的飛躍的?這些問題在本書中您都能找到答案,本書首次將構(gòu)建Web搜索引擎的復(fù)雜過程以一種清晰的全景方式展現(xiàn)給讀者。”  ——Peter Norving,Google公司研究主管  “本書將信息檢索這個舉足輕重而又發(fā)展迅猛的領(lǐng)域進行了全面、新穎、準確的介紹,我們非常需要這樣一本教科書?!薄  猂aymond J.Mooney,得克薩斯大學奧斯汀分校教授  “此書內(nèi)容新穎,選材獨特,對信息檢索的基礎(chǔ)知識和發(fā)展方向進行了生動的描述?!薄  狫on Kleinberg,康奈爾大學教授

編輯推薦

  《信息檢索導(dǎo)論(英文版)》從計算機科學領(lǐng)域的角度出發(fā),介紹了信息檢索的基礎(chǔ)知識,并對當前信息檢索的發(fā)展做了回顧,重點介紹了搜索引擎的核心技術(shù),如文檔分類和文檔聚類問題,以及機器學習和數(shù)值計算方法。書中所有重要的思想都用示例進行了解釋,生動形象,引人入勝,實現(xiàn)了理論與實戰(zhàn)的完美結(jié)合?!  缎畔z索導(dǎo)論(英文版)》的三位作者均是信息檢索領(lǐng)域的頂級專家,兩位來自學術(shù)教育界,一位來自硅谷業(yè)界,使《信息檢索導(dǎo)論(英文版)》既具備深厚的理論基礎(chǔ),又代表了尖端科技水準。因此,該書甫一出版,即被奉為該領(lǐng)域的權(quán)威著作,備受矚目,目前已被眾多世界名校采用為信息檢索課程的教材。

圖書封面

圖書標簽Tags

評論、評分、閱讀與下載


    信息檢索導(dǎo)論 PDF格式下載


用戶評論 (總計14條)

 
 

  •   本書很好的對IR進行了各方面的講解。是一本學習IR,了解IR的經(jīng)典教材。而且內(nèi)容都比較新穎,將最近幾年IR方面的研究成果都概括進本書
  •   書還是不錯的??雌饋硐裾娴?/li>
  •   書倒是不錯,但是把書磨損的真夠嗆,看著就揪心
  •   書的內(nèi)容很好,書的印刷質(zhì)量不怎么好!
  •     搜素引擎入門書籍,各方面均有涉獵,嚴謹,通俗易懂
      入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典入門經(jīng)典
  •     作為入門書籍,還不錯。分別介紹了信息檢索領(lǐng)域的幾個重要概念:倒排索引、檢索引擎;tf-idf權(quán)重計算技術(shù);向量空間模型,信息檢索的評價,有序檢索結(jié)果的評價MAP,ROC曲線,NDCG等等;相關(guān)反饋技術(shù),偽相關(guān)反饋;概率檢索模型,BM25算法;基于語言建模的信息檢索模型,各種文本分類的技術(shù),NB的,VSM的,SVM的;各種文本聚類技術(shù),扁平的,層次的,LSI的;以及最后三章的關(guān)于web搜索的,不過關(guān)于web的都很基礎(chǔ)很淺,沒什么太深入的內(nèi)容。重點推薦的中間部分的章節(jié)(第6,7,8,9,11,12章)。
  •     對于搜索引擎的初學者里說,本書是一本絕對值得閱讀的書目。作者從最簡單的布爾檢索到一個完整的搜索引擎,逐步深入,逐步引導(dǎo)讀者思考,對建造一個大型搜索引擎需要用到的架構(gòu)和算法都有所涉獵,看完后會對搜索引擎有一個大概的認識,對其基本原理也會有所了解。搜索引擎并不僅僅是檢索信息,它還有一個更重要的用處是對返回的結(jié)果進行排序,而這往往是非常重要的。
  •     這本書不錯。值得一看。
      Christopher D. Manning,1989年畢業(yè)于澳大利亞國立大學,1995年獲斯坦福大學語言學博士學位,曾先后在卡內(nèi)基-梅隆大學、悉尼大學教授語言學,1999年起任斯坦福大學計算機科學和語言學副教授,其主要研究方向是統(tǒng)計自然語言處理、信息提取與表示,以及文本理解和文本挖掘等。
      
  •     stanford的IR入門書籍,cmu stanford都在用該書作為IR入門書籍,很nice。在某些章節(jié)如果你有統(tǒng)計的基礎(chǔ)來看的話,會更容易些。
  •     第一次看到這本書的時候,還是在前年,當時這本書還只是個草稿的電子版,基本上ir所涉及到的內(nèi)容都有,講的也比較全面。
      要是你英文閱讀能力還好的話,推薦去讀讀這本書,肯定會對ir有一個較為全面的了解的。
  •   你好,LZ還能記起11章中排序函數(shù)的推導(dǎo)過程那一部分嗎?求解11-15到11-16部分的遞推解釋...
  •   但是我總覺得不適合沒有一點IR基礎(chǔ)的人來看,后面有些章節(jié)還是有點深度的
  •   但是我總覺得不適合沒有一點IR基礎(chǔ)的人來看,后面有些章節(jié)還是有點深度的
    ========================
    后面那些章節(jié)是機器學習的部分,就是介紹一些機器學習的基礎(chǔ)知識,因為現(xiàn)在的IR很多地方用到機器學習
  •   你看的是哪個草稿版?我在官網(wǎng)上找到了一個April, 2009的版本,不知道是不是這個。
 

250萬本中文圖書簡介、評論、評分,PDF格式免費下載。 第一圖書網(wǎng) 手機版

京ICP備13047387號-7