Python自然語(yǔ)言處理

出版時(shí)間:2010-6  出版社:東南大學(xué)出版社  作者:(英)伯德,(英)克萊因,(美)洛普  頁(yè)數(shù):479  
Tag標(biāo)簽:無  

前言

  This is a book about Natural Language Processing. By "natural language" we mean alanguage that is used for everyday communication by humans; languages such as Eng-lish, Hindi, or Portuguese. In contrast to artificial languages Such as programming lan-guages and mathematical notations, natural languages have evolved as they pass fromgeneration to generation, and are hard to pin down with explicit rules. We will takeNatural Language Processing——-or NLP for shortmin a wide sense to cover any kind ofcomputer manipulation of natural language. At one extreme, it could be as simple ascounting word frequencies to compare different writing styles. At the other extreme,NLP involves "understanding" complete human utterances, at least to the extent ofbeing able to give Useful responses to them.  Technologies based on NLP are becoming increasingly widespread. For example,phones and handheld computers support predictive text and handwriting recognition;web search engines give access to information locked up in unstructured text; machinetranslation allows us to retrieve texts written in Chinese and read them in Spanish. Byproviding more natural human-machine interfaces, and more sophisticated access tostored information, language processing has come to play a central role in the multi-lingual information society.This book provides a highly accessible introduction to the field of NIP. It can be usedfor individual study or as the textbook for a course on natural language processing orcomputational linguistics, or as a supplement to courses in artificial intelligence, textmining, or corpus linguistics. The book is intensely practical, containing hundreds offully worked examples and graded exercises.

內(nèi)容概要

  《Python自然語(yǔ)言處理(影印版)》提供了非常易學(xué)的自然語(yǔ)言處理入門介紹,該領(lǐng)域涵蓋從文本和電子郵件預(yù)測(cè)過濾,到自動(dòng)總結(jié)和翻譯等多種語(yǔ)言處理技術(shù)。在《Python自然語(yǔ)言處理(影印版)》中,你將學(xué)會(huì)編寫Python程序處理大量非結(jié)構(gòu)化文本。你還將通過使用綜合語(yǔ)言數(shù)據(jù)結(jié)構(gòu)訪問含有豐富注釋的數(shù)據(jù)集,理解用于分析書面通信內(nèi)容和結(jié)構(gòu)的主要算法?!  禤ython自然語(yǔ)言處理》準(zhǔn)備了充足的示例和練習(xí),可以幫助你:  從非結(jié)構(gòu)化文本中抽取信息,甚至猜測(cè)主題或識(shí)別“命名實(shí)體”;  分析文本語(yǔ)言結(jié)構(gòu),包括解析和語(yǔ)義分析;  訪問流行的語(yǔ)言學(xué)數(shù)據(jù)庫(kù),包括WordNet和樹庫(kù)(treebank);  從多種語(yǔ)言學(xué)和人工智能領(lǐng)域中提取的整合技巧?!  禤ython自然語(yǔ)言處理(影印版)》將幫助你學(xué)習(xí)運(yùn)用Python編程語(yǔ)言和自然語(yǔ)言工具包(NLTK)獲得實(shí)用的自然語(yǔ)言處理技能。如果對(duì)于開發(fā)Web應(yīng)用、分析多語(yǔ)言新聞源或記錄瀕危語(yǔ)言感興趣——即便只是想從程序員視角觀察人類語(yǔ)言如何運(yùn)作,你將發(fā)現(xiàn)《Python自然語(yǔ)言處理》是一本令人著迷且極為有用的好書。

作者簡(jiǎn)介

  伯德(Steven Bird)是墨爾本大學(xué)計(jì)算機(jī)科學(xué)和軟件工程系副教授,以及賓夕法尼亞大學(xué)語(yǔ)言數(shù)據(jù)聯(lián)合會(huì)高級(jí)研究助理?! 】巳R因(Ewan Klein)是愛丁堡大學(xué)信息學(xué)院語(yǔ)言技術(shù)教授?! ÷迤眨‥dward Loper)最近從賓夕法尼亞大學(xué)獲得機(jī)器學(xué)習(xí)自然語(yǔ)言處理博士學(xué)位,目前是波士頓BBN Technologies公司的研究員。

書籍目錄

Preface1.Language Processing and Python1.1 Computing with Language: Texts and Words1.2 A Closer Look at Python: Texts as Lists of Words1.3 Computing with Language: Simple Statistics1.4 Back to Python: Making Decisions and Taking Control1.5 Automatic Natural Language Understanding1.6 Summary1.7 Further Reading1.8 Exercises2.Accessing Text Corpora and Lexical Resources2.1 Accessing Text Corpora2.2 Conditional Frequency Distributions2.3 More Python: Reusing Code2.4 Lexical Resources2.5 WordNet2.6 Summary2.7 Further Reading2.8 Exercises3.Processing Raw Text3.1 Accessing Text from the Web and from Disk3.2 Strings: Text Processing at the Lowest Level3.3 Text Processing with Unicode3.4 Regular Expressions for Detecting Word Patterns3.5 Useful Applications of Regular Expressions3.6 Normalizing Text3.7 Regular Expressions for Tokenizing Text3.8 Segmentation3.9 Formatting: From Lists to Strings3.10 Summary3.11 Further Reading3.12 Exercises4.Writing Structured Programs4.1 Back to the Basics4.2 Sequences4.3 Questions of Style4.4 Functions: The Foundation of Structured Programming4.5 Doing More with Functions4.6 Program Development4.7 Algorithm Design4.8 A Sample of Python Libraries4.9 Summary4.10 Further Reading4.11 Exercises5.Categorizing andTagging Words5.1 Using a Tagger5.2 Tagged Corpora5.3 Mapping Words to Properties Using Python Dictionaries5.4 Automatic Tagging5.5 N-Gram Tagging5.6 Transformation-Based Tagging5.7 How to Determine the Category of a Word5.8 Summary5.9 Further Reading5.10 Exercises6.Learning to Classify Text6.1 Supervised Classification6.2 Further Examples of Supervised Classification6.3 Evaluation6.4 Decision Trees6.5 Naive Bayes Classifiers6.6 Maximum Entropy Classifiers6.7 Modeling Linguistic Patterns6.8 Summary6.9 Further Reading6.10 Exercises7.Extracting Information from Text7.1 Information Extraction7.2 Chunking7.3 Developing and Evaluating Chunkers7.4 Recursion in Linguistic Structure7.5 Named Entity Recognition7.6 Relation Extraction7.7 Summary7.8 Further Reading7.9 Exercises8.Analyzing Sentence Structure8.1 Some Grammatical Dilemmas8.2 Whats the Use of Syntax?8.3 Context-Free Grammar8.4 Parsing with Context-Free Grammar8.5 Dependencies and Dependency Grammar8.6 Grammar Development8.7 Summary8.8 Further Reading8.9 Exercises9.Building Feature-Based Grammars9.1 Grammatical Features9.2 Processing Feature Structures9.3 Extending a Feature-Based Grammar9.4 Summary9.5 Further Reading9.6 Exercises10.Analyzing the Meaning of Sentences10.1 Natural Language Understanding10.2 Propositional Logic10.3 First-Order Logic10.4 The Semantics of English Sentences10.5 Discourse Semantics10.6 Summary10.7 Further Reading10.8 Exercises11.Managing Linguistic Data11.1 Corpus Structure: A Case Study11.2 The Life Cycle of a Corpus11.3 Acquiring Data11.4 Working with XML11.5 Working with Toolbox Data11.6 Describing Language Resources Using OLAC Metadata11.7 Summary11.8 Further Reading11.9 ExercisesAfterword: The Language ChallengeBibliographyNLTK IndexGeneral Index

章節(jié)摘錄

  Back in elementary school you learned the difference between nouns, verbs, adjectives,and adverbs. These "word classes" are not just the idle invention of grammarians, but are useful categories for many language processing tasks. As we will see, they arise from simple analysis of the distribution of words in text. The goal of this chapter is to answer the following questions:  1. What are lexical categories, and how are they used in natural language processing?  2. What is a good Python data structure for storing words and their categories?  3. How can we automatically tag each word of a text with its word class?  Along the way, well cover some fundamental techniques in NLP, including sequence  labeling, n-gram models, backoff, and evaluation. These techniques are useful in many areas, and tagging gives us a simple context in which to present them. We will also see how tagging is the second step in the typical NLP pipeline, following tokenization.  The process of classifying words into their parts-of-speech and labeling them accord-ingly is known as part-of-speech tagging, POS tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset. Our emphasis in this chapter is on exploiting tags, and tagging text automatically.

媒體關(guān)注與評(píng)論

  “很少有這樣一本方法清晰、代碼整潔的書來討論如此高難度的計(jì)算機(jī)問題……這是學(xué)習(xí)自然語(yǔ)言處理的入門佳作?!薄  狵en Getz,資深咨詢顧問,MCW Technologies公司

圖書封面

圖書標(biāo)簽Tags

評(píng)論、評(píng)分、閱讀與下載


    Python自然語(yǔ)言處理 PDF格式下載


用戶評(píng)論 (總計(jì)0條)

 
 

 

250萬(wàn)本中文圖書簡(jiǎn)介、評(píng)論、評(píng)分,PDF格式免費(fèi)下載。 第一圖書網(wǎng) 手機(jī)版

京ICP備13047387號(hào)-7