生物信息學

出版時間:2006-10  出版社:科學出版社  作者:[美] D.W.芒特  頁數(shù):582  譯者:曹志偉  
Tag標簽:無  

內(nèi)容概要

  當前生物信息學研究重點是對基因組序列、蛋白質(zhì)組學和數(shù)組技術(shù)所產(chǎn)生的大量數(shù)據(jù)的計算分析。本書對DNA、RNA和蛋白質(zhì)數(shù)據(jù)的計算提供了豐富的演算方法,并指出了在解決生物學問題中這些方法的優(yōu)缺點及應用策略?! ”緯牡谝话媸窃贛ount博士講稿的基礎上進行整理出版的,在全球范圍內(nèi)用作教材。第二版對內(nèi)容進行了全面的修訂,由專業(yè)教師提供導讀,最大程度地適用本科生和研究生教學。?  本書為高等院校生物信息學專業(yè)本科生和研究生提供理想的學習材料。同時,本書也適宜科研人員、信息專家自學使用。

書籍目錄

CHAPTER 1 歷史簡介和概論CHAPTER 2 Collecting and Storing Sequences in the LaboratoryCHAPTER 3 Alignment of Pairs of SequencesCHAPTER 4 Introduction to Probability and Statistical Analysis of Sequence AlignmentsCHAPTER 5 Multiple Sequence AlignmentCHAPTER 6 Sequcence Database Searching for Similar SequencesCHAPTER 7 Phylogenetic PredictionCHAPTER 8 Prediction of RNA Secondary StructureCHAPTER 9 Gene Prediction and RegulationCHAPTER 10 Protein Classification and Structure PredictionCHAPTER 11 Genome AnalysisCHAPTER 12 Bioinformatics Programming Using Perl and Perl ModulesCHAPTER 13 Analysis of MicroarraysIndex

章節(jié)摘錄

  The object is to adjust these parameters so that the model represents the observed variation in a group of related protein sequences. A model trained in this manner will provide a statistically probable msa of the sequences.  One problem with HMMs is that the training set has to be quite large (50 or more sequences) to pro-duce a useful model for the sequences. A difficulty in training the HMM residues is that many different parameters must be found (the amino acid distribu-tions, the number and positions of insert and delete states, and the state transition frequenaes add up to thousands of parameters) to obtain a suitable model,and the purpose of the prior and training data is to find a suitable estimate for all these parameters.When trying to make an alignment of short sequence fragments to produce a profile HMM, this problem is worsened because the amount of data for training the modelis even further reduced.  Algorithms for calculation of an HMM. As illus-trated in Figure 5.16, the goal is to calculate the best HMM for a group of sequences by optimizing the transition probabilities between states and the amino aad compositions of each match state in the model.The sequences do not have to be aligned to use the method. Once a reasonable model length reflecting the expected length of the sequence alignment is cho-sen, the model is adjusted incrementally to predict the sequences. Several methods for training the modelin this fashion have been described (Baldi et al.1994; Krogh et al. 1994; Eddy et al. 1995; Eddy 1996;Hughey and Krogh 1996; Durbin et al. 1998). For example, the Baum-Welch algorithm, previously used in speech recognition methods, adjusts the parameters of HMMs for optimal matching to sequences, as discussed below. This HMM is devel-oped as follows:  1.The model is initialized with estimates of transi-tion probabilities, the probability of moving from one state to another particular state in the model(e.g., the probability of moving from one match state to the next), and the amino aad composi-tion for each match and insert state. If an initial alignment of the sequences is known, or some other kinds of data suggest which sequence posi-tions are the same, these data may be used in the model. For other cases, the initial distribution of amino acids to be used in each state is described below. The initial transition probabilities that are chosen generally favor transitions from one match state, a part of the model that represents one column in an msa, to the next match state,representing the next column. The alternative of using transitions to insert and delete states, which would delete a position or add another sequence character,is less favored because this builds more uncertainty into the HMM sequence model,  2.All possible paths through the model for generat-ing each sequence in turn are examined. There are many possible such paths for each sequence. This procedure would normally require a huge amount of time computationally. Fortunately, an algo-rithm, the forward-backward algorithm, reduces the number of computations to the number of steps in the model times the total length of the training sequences. This calculation provides a probability of the sequence, given all possible paths through the model, and, from this value, the probability of any particular path may be found. The Baum-Welch algorithm, referred to above,then counts the number of times a particular state-to-state transition is used and a particular amino acid is required by a particular match state to generate the corresponding sequence position.  3.A new version of the HMM is produced that uses the results found in step 2 to generate new transi-tion probabilities and match-insert state compo-sitions.  4.Steps 3 and 4 are repeated up to ten more times to train the model until the parameters do not change significantly.  5.The trained model is used to provide the most likely path for each sequence, as described in Figure 5.16. The algorithm used for this purpose,the Viterbi algorithm, does not have to go. Through all of the possible alignments of a given sequence to the HMM to find the most probable alignment,but instead can find the alignvment by a dynamic  programming technique very much like that used for the alignment of two sequences, as discussed in Chapter 3. The collection of paths for the sequences provides an msa of the sequences with the corresponding match, insert, and delete states for each sequence. The columns in the msa are defined by the match states in the HMM such that ammo acids from a particular match state are placed in the same column. For columns that do not correspond to a match state, a gap is added.  6. The HMM may be used to search a sequence database for additional sequences that share the same sequence variation. In this case, the sum of the probabilities of all possible sequence align- ments to the model is obtained. This probability is calculated by the forward component of the forward-backward algorithm described above in step 2. This analysis gives a type of distance score of the sequence from the model, thus providing an indication of how well a new sequence fits the model and whether the sequence may be related to the sequences used to train the model. In later  derivations of HMMs, the score was divided by the length of the sequence because it was found to be length dependent.A z score giving the num-ber of standard deviations of the sequence length-corrected score from the mean length-corrected score is therefore used (Durbin et al.1998).  Recall that for the Bayes block aligner, the initial or prior conditions were amino acid substitution  matrices, block numbers, and alignments of the sequences. The sequences were then used as new data to examine the model by producing scores for every possible combination of prior conditions. By using Bayes' rule, these data provided posterior probability distributions for all combinations of prior informa-tion. Similarly, the prior conditions of the HMM are the initial values given to the transition values and amino aad compositions. The sequences then pro-vide new data for improving the model. Finally, the model provides a posterior probability distribution for the sequences and the maximum posterior prob- ability for each sequence represented by a particular path through the model. This path provides the alignment of the sequence in the msa;i.e.,the sequence plus matches, inserts, and deletes, as described in Figure 5.16. (Bayes' rule is discussed in Chapter 4, p. 148, along with related terms of condi-tional probability including prior and posterior probability.)  ……

圖書封面

圖書標簽Tags

評論、評分、閱讀與下載


    生物信息學 PDF格式下載


用戶評論 (總計28條)

 
 

  •   你好!這本書整體還是不錯的,英文,附光盤,很適合學習和使用。送貨服務也很好。不過,由于我下定單十忽略了一個問題,請問你們能否給我補開一張這本書的發(fā)票?金額61.00元。蘭州大學生科院植物所.謝謝!
  •   絕對的經(jīng)典教材,生物信息必備。雖然絕大部分是全英文,由淺入深,容易理解
  •   相當于影印版第二版~這本書很經(jīng)典~買到很開心~
  •   很好的一本書!講解很詳細.但是是全英文
  •   書本內(nèi)容豐富有內(nèi)涵
  •   專業(yè)課啊。。。絕對全面加上,正版,開始復習了。好難??!
  •   這書看起來不新
  •   大概翻閱下,幾乎是全英文的,有點難度,得慢慢消化
  •   本來就想買英文版的,無奈正版太貴,影印版又沒有~這本只有第一章是中文的奇葩版本剛好滿足我的胃口,哈哈~不過客觀的來說,應該算是譯者的不負責把。
  •   書還是不錯的,本質(zhì)上是英文書。編譯者只翻譯了前一章。有欺世盜名之嫌。
  •   很不錯!沒得說!
  •   不錯,英文原版和部分翻譯,用來作為我的教學參考書挺合適的,只是不知道為何在一本書中同時出現(xiàn)。
  •   感覺這是一本全英文教材,而且比amazon網(wǎng)上出售的同名教材遠遠便宜。唯一的遺憾是配套光盤內(nèi)容不夠充實。
  •   買之前已經(jīng)看了大家的平論,準備提升一下專業(yè)外語水平,還是買了。就像大家說的只翻譯了一章,還掛個編譯者,真好意思。
  •   這本書不錯,雖然是英文的,但是比較好懂。明明就只翻譯了一章而已,還寫個某某某編譯,真是坑爹。
  •   第一章翻譯了,后面的全是英文
  •   全英文,看不完全懂
  •   蠻好,就是對于英語不好的我閱讀起來有點困難
  •   還行 就是英文版的
  •   書的內(nèi)容很多,就是是英文,本來看到譯者以為是中文的,郁悶。不過還可以。
  •   書的質(zhì)量還可以,就是發(fā)貨稍微慢了點
  •   真棒!這里購物便捷快速,送貨及時!贊一個!
  •   就是翻譯的太少了
  •   就翻譯了一章還號稱編譯,真是個騙子。書倒是好書
  •   也就那樣吧,一般般,想的東西都沒有
  •   啥叫編譯啊,就這水平還。。。算了,他這水平要是翻譯的多了還不如看原文呢
  •   就翻譯了一章還號稱編譯,作者真會蒙人。
  •   紙質(zhì)差,有些破損
 

250萬本中文圖書簡介、評論、評分,PDF格式免費下載。 第一圖書網(wǎng) 手機版

京ICP備13047387號-7