基于開源工具的數(shù)據(jù)分析

出版時間:2011-5  出版社:東南大學  作者:Philipp K. Janert  頁數(shù):509  
Tag標簽:無  

內(nèi)容概要

數(shù)據(jù)收集相對比較簡單,而要把原始信息轉(zhuǎn)化為有用的數(shù)據(jù)則需要知道如何精確地抽取你想要的內(nèi)容。通過這本書(作者Philipp
K.Janert)的深入講解,那些對數(shù)據(jù)分析感興趣的中等或者富有經(jīng)驗的程序員將可以學習到在商業(yè)環(huán)境中與數(shù)據(jù)打交道的技術(shù)。你將了解到如何觀察數(shù)據(jù)來找出它所包含的信息,如何在概念模型里捕捉到這些想法,然后把你的理解通過商業(yè)計劃、度量標準的精確報告和其他方式反饋給你所在的機構(gòu)。
你將會通過本書每章結(jié)束部分的動手實踐來慢慢體驗各種概念。最重要的是,你將了解到如何思考你所希望獲取的數(shù)據(jù)——而不是依賴于工具來替你思考。

作者簡介

Philipp
K.Janert目前提供數(shù)據(jù)分析和數(shù)學模型的咨詢服務,他曾經(jīng)是物理學家和軟件工程師。他是《Gnuplot in
Action:Understanding Data with Graphs》(Manning出版)的作者,他為O’Reillv
Network.IBM
deVeloperWorks和IEEEsoftware寫過文章。他擁有Washington大學理論物理學的博士學位。

書籍目錄

PREFACE
1 INTRODUCTION
Data Analysis
What's in This Book
What's with the Workshops?
What's with the Math?
What You'll Need
What's Missing
PART I Graphics: Looking at Data
2 A SINGLE VARIABLE: SHAPE AND DISTRIBUTION
Dot andJitter Plots
Histograms and Kernel Density Estimates
The Cumu/atiue Distribution Function
Rank-Order Plots and Lilt Charts
Only When Appropriate: Summary Statistics and Box Plots
Workshop: NumPy
Further Reading
3 TWO VARIABLES: ESTABLISHING RELATIONSHIPS
Scatter Plots
Conquering Noise: 5moothing
Logarithmic Plots
Banking
Linear ReRression and All That
Shouwing What's Important
Graphical Analysis and Presentation Graphics
Workshop: matplotlib
Further Reading
TIME AS A VARIABLE: TIME-SERIES ANALYSIS
Examples
The Task
Smoothing
Don't Ouerlook the Obuious!
The Correlation Function
Optional: Filters and Conuolutions
Workshop: scipy.signal
Further ReadinR
5 MORE THAN TWO VARIABLES: GRAPHICAL MULTIVARIATE ANALYSIS
False-Color Plots
A Lot at a Glance: Multiplots
Composition Problems
Nouel Plot Types
Interactiue Explorations
Workshop: Tools for Multiuariate Graphics
Further ReadinR
6 INTERMEZZO: A DATA ANALYSIS SESSION
A Data Analysis Session
Workshop: gnuplot
Further ReadinR
PART II Analyticg: Modeling Data
7 GUESSTIMATION AND THE BACK OF THE ENVELOPE
Principles of Guesstimation
How Good Are Those Numbers?
Optional: A Closer Look at Perturbation Theory and
Error PropaRation
Workshop: The Gnu Scientific Library (GSL)
Further Reading
8 MODELS FROM SCALING ARGUMENTS
Models
ArRuments from Scale
Mean-Field Approximations
Common Time-Euolution Scenarios
Case Study: How Many Seruers Are Best?
Why Modeling?
Workshop: Sage
Further Reading
9 ARGUMENTS FROM PROBABILITY MODELS
The. Binomial Distribution and Bernoulli Trials
The Gaussian Distribution and the Central Limit Theorem
Power-Law Distributions and Non-Normal Statistics
Other Distributions
Optional: Case Study--Unique Visitors ouer Time
Workshop: Power-Law Distributions
Further Reading
10 WHAT YOU REALLY NEED TO KNOW ABOUT CLASSICAL STATISTICS
Genesis
Statistics Defined
Statistics Explained
Controlled Experiments Versus Obseruationa} Studies
Optional: Bayesian Statistics--The Other Point of View
Workshop: R
Further Reading
11 INTERMEZZO:MYTHBUSTING--BIGFOOT, LEAST SQUARES, AND ALL
THAT
How to Auerage Auerages
The Standard Deuiation
Least Squares
Further Reading
PART III Computation: Mininhg Data
12 SIMULATIONS
A Warm-Up Question
Monte Carlo Simulations
Resampling Methods
Workshop: Discrete Euent Simulations with Simpy
Further Reading
13 FINDING CLUSTERS
What Constitutes a Cluster?
Distance and Similarity Measures
Clustering Methods
Pre-and Postprocessing
Other ThouRhts
A Special Case: Market BasketAnalysis
A Word of WarninR
Workshop: P/cluster and the C Clustering Library
Further Reading
14 SEEING THE FOREST FOR THE TREES: FINDING
IMPORTANT ATTRIBUTES
Principal Component Analysis
Visual Techniques
Kohonen Maps
Workshop: PCA with R
Further Readin2
15 INTERMEZZO:WHEN MORE IS DIFFERENT
A Horror Story
Some Suggestions
What About Map/Reduce?
Workshop: Generating Permutations
Further Reading
PART IV Applications: Using Data
16 REPORTING, BUSINESS INTELLIGENCE, AND DASHBOARDS
Business Intelligence
Corporate Metrics and Dashboards
Data Quality Issues
Workshop: Berkeley DB and SQLite
Further Reading
17 FINANCIAL CALCULATIONS AND MODELING
The Time Value o[ Money
Uncertainty in Planning and Opportunity Costs
Cost Concepts and Depreciation
Should You Care?
Is This All That Matters?
Workshop: The Newsuendor Problem
Further Reading
18 PREDICTIVE ANALYTICS
Introduction
Some Classification Terminology
Algorithms for Classification
The Process
The Secret Sauce
The Nature o[ Statistical Learning
Workshop: Two Do-lt-Yoursel Classifiers
Further Reading
19 EPILOGUE: FACTS ARE NOT REALITY
A PROGRAMMING ENVIRONMENTS FOR SCIENTIFIC COMPUTATION
AND DATA ANALYSIS
Software Tools
A Catalog of Scientific Software
Writing Your Own
Further Reading
B RESULTS FROM CALCULUS
Common Functions
Calculus
Useful Tricks
Notation and Basic Math
Where to Go from Here
Further Readin9
WORKING WITH DATA
Sources for Data
Cleanin9 and ConditioninR
Sarnplin9
Data File Formats
The Care and Feeding of Your Data Zoo
Skills
Terminology
Further Fleadin9
INDEX

編輯推薦

《基于開源工具的數(shù)據(jù)分析(影印版)》(作者Philipp K.Janert)使用圖形來描述帶有一個、兩個或者十多個變量的數(shù)據(jù);使用粗略計算以及維度和概率參數(shù)來開發(fā)概念模型;使用諸如模擬和聚類的集約計算方法來挖掘數(shù)據(jù);通過報告、信息板和其他度量程序來讓你的結(jié)論更容易理解;理解財務計算,包括貨幣時間價值;利用降維技術(shù)或者預測分析來克服數(shù)據(jù)分析過程中面臨的挑戰(zhàn);熟悉數(shù)據(jù)分析的不同開源編程環(huán)境。

圖書封面

圖書標簽Tags

評論、評分、閱讀與下載


    基于開源工具的數(shù)據(jù)分析 PDF格式下載


用戶評論 (總計3條)

 
 

  •   我很討厭評價一本書不按書的內(nèi)容來進行,例如什么給同事買的,說很好;什么快遞太慢。。。。。毫無價值的評論。。。。我的評論也大約和書內(nèi)容無關(guān),只是有關(guān)書的外觀的:書不像新的(難道我買的是二手的?),封面左上角粘膠處有裂開,封面右下角很褶皺。。。。不過,里面的書頁很好,字體也很清晰,應該是正版。至于內(nèi)容,自然是極好的~沖這書買的自然都知道~
  •   速度快,質(zhì)量很好,雖然不怎么懂,以后慢慢看
  •   數(shù)學比較簡單是個遺憾但是考慮到英文事實上是很好的??梢酝暾膶W習到數(shù)據(jù)挖掘的基本概念和實戰(zhàn)經(jīng)驗。
 

250萬本中文圖書簡介、評論、評分,PDF格式免費下載。 第一圖書網(wǎng) 手機版

京ICP備13047387號-7