出版時間:2010.9 出版社:機(jī)械工業(yè)出版社 作者:(美)Pang-Ning Tan,Michael Steinbach,Vipin Kumar 頁數(shù):769
Tag標(biāo)簽:無
前言
Advances in data generation and collection are producing data sets of massire size in commerce and a variety of scientific disciplines.Data warehouses store details of the sales and operations of businesses,Earth-orbiting satelfites beam high-resolution images and sensor data back to Earth.a(chǎn)nd genomics ex- periments generate sequence,structural,and functional data for an increasing number of organisms.The ease with Which data can now be gathered and stored has created a new attitude toward data analysis:Gather whatever data you can whenever and wherever possible.It has become an article of faith that the gathered data will have value.either for the purpose that initially motivated its collection or for purposes not yet envisioned.The field of data mining grew out of the limitations of current data analysis techniques in handling the challenges posed by these new types of data sets.Data mining does not replace other areas of data analysis,but rat.Her takes them as the foundation for much of its work.While some areas of data mining,such as association analysis,are unique to the field,other areas,such as clustering,classification, and anomaly detection,build upon a long history of work on these topics in other fields.Indeed.the willingness of data mining researchers to draw upon existing techniques has contributed to the strength and breadth of the field,as well as to its rapid growth.
內(nèi)容概要
本書全面介紹了數(shù)據(jù)挖掘的理論和方法,著重介紹如何用數(shù)據(jù)挖掘知識解決各種實際問題,涉及學(xué)科領(lǐng)域眾多,適用面廣。書中涵蓋5個主題:數(shù)據(jù)、分類、關(guān)聯(lián)分析、聚類和異常檢測。除異常檢測外,每個主題都包含兩章:前面一章講述基本概念、代表性算法和評估技術(shù),后面一章較深入地討論高級概念和算法。目的是使讀者在透徹地理解數(shù)據(jù)挖掘基礎(chǔ)的同時,還能了解更多重要的高級主題。.包含大量的圖表、綜合示例和豐富的習(xí)題。·不需要數(shù)據(jù)庫背景。只需要很少的統(tǒng)計學(xué)或數(shù)學(xué)背景知識?!ぞW(wǎng)上配套教輔資源豐富,包括PPT、習(xí)題解答、數(shù)據(jù)集等。
作者簡介
作者:(美國)譚(Pang-Ning Tan) (美國)斯坦巴克(Michael Steinbach) (美國)庫馬爾(Vipin Kumar)Pang.Ning Tan現(xiàn)為密歇根州立大學(xué)計算機(jī)與工程系助理教授,主要教授數(shù)據(jù)挖掘、數(shù)據(jù)庫系統(tǒng)等課程。他的研究主要關(guān)注于為廣泛的應(yīng)用(包括醫(yī)學(xué)信息學(xué)、地球科學(xué)、社會網(wǎng)絡(luò)、Web挖掘和計算機(jī)安全)開發(fā)適用的數(shù)據(jù)挖掘算法。Michael Steinbach擁有明尼蘇達(dá)大學(xué)數(shù)學(xué)學(xué)士學(xué)位、統(tǒng)計學(xué)碩士學(xué)位和計算機(jī)科學(xué)博士學(xué)位,現(xiàn)為明尼蘇達(dá)大學(xué)雙城分校計算機(jī)科學(xué)與工程系助理研究員。Vipin Kumar現(xiàn)為明尼蘇達(dá)大學(xué)計算機(jī)科學(xué)與工程系主任和William Norris教授。1 988年至2005年。他曾擔(dān)任美國陸軍高性能計算研究中心主任。
書籍目錄
Preface1 Introduction 1.1 What Is Data Mining? 1.2 Motivating Challenges 1.3 The Origins of Data Mining 1.4 Data Mining Tasks 1.5 Scope and Organization of the Book 1.6 Bibliographic Notes 1.7 Exercises2 Data 2.1 Types of Data 2.1.1 Attributes and Measurement 2.1.2 Types of Data Sets 2.2 Data Quality 2.2.1 Measurement and Data Collection Issues 2.2.2 Issues Related to Applications 2.3 Data Preprocessing 2.3.1 Aggregation 2.3.2 Sampling 2.3.3 Dimensionality Reduction 2.3.4 Feature Subset Selection 2.3.5 Feature Creation 2.3.6 Discretization and Binarization 2.3.7 Variable Transformation 2.4 Measures of Similarity and Dissimilarity 2.4.1 Basics 2.4.2 Similarity and Dissimilarity between Simple Attributes. 2.4.3 Dissimilarities between Data Objects 2.4.4 Similarities between Data Objects 2.4.5 Examples of Proximity Measures 2.4.6 Issues in Proximity Calculation 2.4.7 Selecting the Right Proximity Measure 2.5 Bibliographic Notes 2.6 Exercises3 Exploring Data 3.1 The Iris Data Set 3.2 Summary Statistics 3.2.1 Frequencies and the Mode 3.2.2 Percentiles 3.2.3 Measures of Location: Mean and Median 3.2.4 Measures of Spread: Range and Variance 3.2.5 Multivariate Summary Statistics 3.2.6 Other Ways to Summarize the Data 3.3 Visualization 3.3.1 Motivations for Visualization 3.3.2 General Concepts 3.3.3 Techniques 3.3.4 Visualizing Higher-Dimensional Data 3.3.5 Do's and Don'ts 3.4 OLAP and Multidimensional Data Analysis 3.4.1 Representing Iris Data as a Multidimensional Array 3.4.2 Multidimensional Data: The General Case 3.4.3 Analyzing Multidimensional Data 3.4.4 Final Comments on Multidimensional Data Analysis 3.5 Bibliographic Notes 3.6 Exercises Classification:4 Basic Concepts, Decision Trees, and Model Evaluation 4.1 Preliminaries 4.2 General Approach to Solving a Classification Problem 4.3 Decision Tree Induction 4.3.1 How a Decision Tree Works 4.3.2 How to Build a Decision Tree 4.3.3 Methods for Expressing Attribute Test Conditions . 4.3.4 Measures for Selecting the Best Split 4.3.5 Algorithm for Decision Tree Induction 4.3.6 An Example: Web Robot Detection 4.3.7 Characteristics of Decision Tree Induction 4.4 Model Overfitting 4.4.1 Overfitting Due to Presence of Noise 4.4.2 Overfitting Due to Lack of Representative Samples . 4.4.3 Overfitting and the Multiple Comparison Procedure 4.4.4 Estimation of Generalization Errors 4.4.5 Handling Overfitting in Decision Tree Induction . . 4.5 Evaluating the Performance of a Classifier 4.5.1 Holdout Method 4.5.2 Random Subsampling 4.5.3 Cross-Validation 4.5.4 Bootstrap 4.6 Methods for Comparing Classifiers 4.6.1 Estimating a Confidence Interval for Accuracy 4.6.2 Comparing the Performance of Two Models 4.6.3 Comparing the Performance of Two Classifiers 4.7 Bibliographic Notes 4.8 Exercises5 Classification: Alternative Techniques6 Association Analysis: Basic Concepts and Algorithms
章節(jié)摘錄
插圖:What Is an attribute?We start with a more detailed definition of an attribute.Definition 2.1. An attribute is a property or characteristic of an object that may vary, either from one object to another or from one time to another.For example, eye color varies from person to person, while the temperature of an object varies over time. Note that eye color is a symbolic attribute with a small number of possible values brown, black, blue, green, hazel, etc.}, while temperature is a numerical attribute with a potentially unlimited number of values.At the most basic level, attributes are not about numbers or symbols. However, to discuss and more precisely analyze the characteristics of objects, we assign numbers or symbols to them. To do this in a well-defined way, we need a measurement scale. Definition 2.2. A measurement scale is a rule (function) that associates a numerical or symbolic value with an attribute of an object.Formally, the process of measurement is the application of a measurement scale to associate a value with a particular attribute of a specific object. While this may seem a bit abstract, we engage in the process of measurement all the time. For instance, we step on a bathroom scale to determine our weight, we classify someone as male or female, or we count the number of chairs in a room to see if there will be enough to seat all the people coming to a meeting. In all these cases, the "physical value" of an attribute of an object is mapped to a numerical or symbolic value.With this background, we can now discuss the type of an attribute, a concept that is important in determining if a particular data analysis technique is consistent with a specific type of attribute.
編輯推薦
《數(shù)據(jù)挖掘?qū)д?英文版)》是經(jīng)典原版書庫。
圖書封面
圖書標(biāo)簽Tags
無
評論、評分、閱讀與下載