數(shù)據(jù)挖掘導論

出版時間:2010.9  出版社:機械工業(yè)出版社  作者:(美)Pang-Ning Tan,Michael Steinbach,Vipin Kumar  頁數(shù):769  
Tag標簽:無  

前言

Advances in data generation and collection are producing data sets of massire size in commerce and a variety of scientific disciplines.Data warehouses store details of the sales and operations of businesses,Earth-orbiting satelfites beam high-resolution images and sensor data back to Earth.a(chǎn)nd genomics ex- periments generate sequence,structural,and functional data for an increasing number of organisms.The ease with Which data can now be gathered and stored has created a new attitude toward data analysis:Gather whatever data you can whenever and wherever possible.It has become an article of faith that the gathered data will have value.either for the purpose that initially motivated its collection or for purposes not yet envisioned.The field of data mining grew out of the limitations of current data analysis techniques in handling the challenges posed by these new types of data sets.Data mining does not replace other areas of data analysis,but rat.Her takes them as the foundation for much of its work.While some areas of data mining,such as association analysis,are unique to the field,other areas,such as clustering,classification, and anomaly detection,build upon a long history of work on these topics in other fields.Indeed.the willingness of data mining researchers to draw upon existing techniques has contributed to the strength and breadth of the field,as well as to its rapid growth.

內(nèi)容概要

本書全面介紹了數(shù)據(jù)挖掘的理論和方法,著重介紹如何用數(shù)據(jù)挖掘知識解決各種實際問題,涉及學科領域眾多,適用面廣。書中涵蓋5個主題:數(shù)據(jù)、分類、關聯(lián)分析、聚類和異常檢測。除異常檢測外,每個主題都包含兩章:前面一章講述基本概念、代表性算法和評估技術,后面一章較深入地討論高級概念和算法。目的是使讀者在透徹地理解數(shù)據(jù)挖掘基礎的同時,還能了解更多重要的高級主題。.包含大量的圖表、綜合示例和豐富的習題?!げ恍枰獢?shù)據(jù)庫背景。只需要很少的統(tǒng)計學或數(shù)學背景知識。·網(wǎng)上配套教輔資源豐富,包括PPT、習題解答、數(shù)據(jù)集等。

作者簡介

作者:(美國)譚(Pang-Ning Tan) (美國)斯坦巴克(Michael Steinbach) (美國)庫馬爾(Vipin Kumar)Pang.Ning Tan現(xiàn)為密歇根州立大學計算機與工程系助理教授,主要教授數(shù)據(jù)挖掘、數(shù)據(jù)庫系統(tǒng)等課程。他的研究主要關注于為廣泛的應用(包括醫(yī)學信息學、地球科學、社會網(wǎng)絡、Web挖掘和計算機安全)開發(fā)適用的數(shù)據(jù)挖掘算法。Michael Steinbach擁有明尼蘇達大學數(shù)學學士學位、統(tǒng)計學碩士學位和計算機科學博士學位,現(xiàn)為明尼蘇達大學雙城分校計算機科學與工程系助理研究員。Vipin Kumar現(xiàn)為明尼蘇達大學計算機科學與工程系主任和William Norris教授。1 988年至2005年。他曾擔任美國陸軍高性能計算研究中心主任。

書籍目錄

Preface1 Introduction 1.1 What Is Data Mining? 1.2 Motivating Challenges 1.3 The Origins of Data Mining 1.4 Data Mining Tasks 1.5 Scope and Organization of the Book 1.6 Bibliographic Notes 1.7 Exercises2 Data 2.1 Types of Data  2.1.1 Attributes and Measurement  2.1.2 Types of Data Sets 2.2 Data Quality  2.2.1 Measurement and Data Collection Issues  2.2.2 Issues Related to Applications 2.3 Data Preprocessing  2.3.1 Aggregation  2.3.2 Sampling  2.3.3 Dimensionality Reduction  2.3.4 Feature Subset Selection  2.3.5 Feature Creation  2.3.6 Discretization and Binarization  2.3.7 Variable Transformation 2.4 Measures of Similarity and Dissimilarity  2.4.1 Basics  2.4.2 Similarity and Dissimilarity between Simple Attributes.  2.4.3 Dissimilarities between Data Objects  2.4.4 Similarities between Data Objects  2.4.5 Examples of Proximity Measures  2.4.6 Issues in Proximity Calculation  2.4.7 Selecting the Right Proximity Measure 2.5 Bibliographic Notes 2.6 Exercises3 Exploring Data 3.1 The Iris Data Set 3.2 Summary Statistics  3.2.1 Frequencies and the Mode  3.2.2 Percentiles  3.2.3 Measures of Location: Mean and Median  3.2.4 Measures of Spread: Range and Variance  3.2.5 Multivariate Summary Statistics  3.2.6 Other Ways to Summarize the Data 3.3 Visualization  3.3.1 Motivations for Visualization  3.3.2 General Concepts  3.3.3 Techniques  3.3.4 Visualizing Higher-Dimensional Data  3.3.5 Do's and Don'ts 3.4 OLAP and Multidimensional Data Analysis  3.4.1 Representing Iris Data as a Multidimensional Array  3.4.2 Multidimensional Data: The General Case  3.4.3 Analyzing Multidimensional Data  3.4.4 Final Comments on Multidimensional Data Analysis 3.5 Bibliographic Notes 3.6 Exercises Classification:4 Basic Concepts, Decision Trees, and Model Evaluation 4.1 Preliminaries 4.2 General Approach to Solving a Classification Problem  4.3 Decision Tree Induction  4.3.1 How a Decision Tree Works  4.3.2 How to Build a Decision Tree  4.3.3 Methods for Expressing Attribute Test Conditions .  4.3.4 Measures for Selecting the Best Split  4.3.5 Algorithm for Decision Tree Induction  4.3.6 An Example: Web Robot Detection  4.3.7 Characteristics of Decision Tree Induction 4.4 Model Overfitting  4.4.1 Overfitting Due to Presence of Noise  4.4.2 Overfitting Due to Lack of Representative Samples .  4.4.3 Overfitting and the Multiple Comparison Procedure  4.4.4 Estimation of Generalization Errors  4.4.5 Handling Overfitting in Decision Tree Induction . . 4.5 Evaluating the Performance of a Classifier  4.5.1 Holdout Method  4.5.2 Random Subsampling  4.5.3 Cross-Validation  4.5.4 Bootstrap 4.6 Methods for Comparing Classifiers  4.6.1 Estimating a Confidence Interval for Accuracy  4.6.2 Comparing the Performance of Two Models  4.6.3 Comparing the Performance of Two Classifiers  4.7 Bibliographic Notes 4.8 Exercises5 Classification: Alternative Techniques6 Association Analysis: Basic Concepts and Algorithms

章節(jié)摘錄

插圖:What Is an attribute?We start with a more detailed definition of an attribute.Definition 2.1. An attribute is a property or characteristic of an object that may vary, either from one object to another or from one time to another.For example, eye color varies from person to person, while the temperature of an object varies over time. Note that eye color is a symbolic attribute with a small number of possible values brown, black, blue, green, hazel, etc.}, while temperature is a numerical attribute with a potentially unlimited number of values.At the most basic level, attributes are not about numbers or symbols. However, to discuss and more precisely analyze the characteristics of objects, we assign numbers or symbols to them. To do this in a well-defined way, we need a measurement scale. Definition 2.2. A measurement scale is a rule (function) that associates a numerical or symbolic value with an attribute of an object.Formally, the process of measurement is the application of a measurement scale to associate a value with a particular attribute of a specific object. While this may seem a bit abstract, we engage in the process of measurement all the time.  For instance, we step on a bathroom scale to determine our weight, we classify someone as male or female, or we count the number of chairs in a room to see if there will be enough to seat all the people coming to a meeting. In all these cases, the "physical value" of an attribute of an object is mapped to a numerical or symbolic value.With this background, we can now discuss the type of an attribute, a concept that is important in determining if a particular data analysis technique is consistent with a specific type of attribute.

編輯推薦

《數(shù)據(jù)挖掘導論(英文版)》是經(jīng)典原版書庫。

圖書封面

圖書標簽Tags

評論、評分、閱讀與下載


    數(shù)據(jù)挖掘導論 PDF格式下載


用戶評論 (總計31條)

 
 

  •   剛拿到書,翻開書,全是英文,表示很有挑戰(zhàn)性,沒辦法,既然選擇了遠方,那就數(shù)據(jù)挖掘技術和英語水平一起提升吧,相信自己一定可以的
  •   如果學習數(shù)據(jù)挖掘,建議從這本書開始,不建議看國內(nèi)的書籍。
  •   數(shù)據(jù)挖掘入門必備書籍!
  •   內(nèi)容全面,概念清晰,雖是導論,亦有足夠的廣度,以之入門甚好
  •   讀書必讀經(jīng)典,英文版的,剛拿手里,紙張還好,不過內(nèi)容挺充實的
  •   建議多看看原版書,可以理解作者原意。
  •   英文的書,就是比翻譯的好多了
  •   強烈推薦這套書,中英文的都不錯
  •   剛買不久,學習中。據(jù)說不錯。
  •   讀了一部分了,要繼續(xù)讀
  •   nice stuff~
  •   英文不錯,既可以提高英語水平,還可以擴展專業(yè)詞匯量
  •   給女兒買的,她說:與原版本相同,質(zhì)量不錯。
  •   經(jīng)典教材,百讀不厭
  •   經(jīng)典書籍,價格實在
  •   浪費錢可恥啊,推薦買中文的
  •   內(nèi)容沒得說,印刷質(zhì)量也很好,推薦!
  •   我真的不喜歡國外教材講半天講不到重點的感覺??赡苁侵巧烫桶伞o法理解這種書籍的美感。當然有些國外教材還是很不錯的。
    這本書用詞淺顯,介紹了數(shù)據(jù)挖掘的一般的技術,難度不大,要深入研究的話要閱讀里面列出的論文材料了。
    ……不過真是不爽頭兩章半天講不到重點的感覺,注意力一下子就渙散了??!
  •   數(shù)據(jù)挖掘導論【英文版】,邊學技術邊學英語
  •   早有耳聞這本書,豆瓣評價不錯就買了,作為數(shù)據(jù)挖掘入門再好不過了,英文版看起來要比那寫劣質(zhì)翻譯版舒服多了,紙張有點薄,但是感覺還行。
  •   數(shù)據(jù)挖掘的經(jīng)典好書,價錢也挺實惠
  •   數(shù)據(jù)挖掘的經(jīng)典入門書籍 感覺很好
  •   32開的
  •   最好的數(shù)據(jù)挖掘書,可是紙張差的不行
  •   英文版,慢慢看
  •   小巧,全英文
  •   適用面廣
  •   精美的書本,經(jīng)典教材
  •   數(shù)據(jù)導論學習必學~!
  •   不錯,幫朋友買的,暫時還沒發(fā)現(xiàn)什么缺點。
  •   便宜點的多點
 

250萬本中文圖書簡介、評論、評分,PDF格式免費下載。 第一圖書網(wǎng) 手機版

京ICP備13047387號-7