OpenCL編程指南

出版時間:2012-7  出版社:科學(xué)出版社  作者:曼什  頁數(shù):603  字?jǐn)?shù):1003250  
Tag標(biāo)簽:無  

內(nèi)容概要

新的OpenCL標(biāo)準(zhǔn)有助于充分利用CPU、GPU等處理器的豐富資源,已獲得Apple、AMD、Intel、IBM等公司的認(rèn)可,在服務(wù)器、嵌入式設(shè)備、高性能計算等領(lǐng)域有廣闊的應(yīng)用前景。
《OpenCL編程指南》由OpenCL的五大技術(shù)權(quán)威共同撰寫,內(nèi)容涵蓋完整的規(guī)范。在分析關(guān)鍵用戶案例的基礎(chǔ)上,說明了如何用OpenCL表示各類并行算法,并且提供了完整的API和OpenCLC語言的參考信息。通過完整的案例學(xué)習(xí)和代碼示例,講解了編寫復(fù)雜并行程序的方法,實現(xiàn)在眾多不同設(shè)備間分解工作量,還介紹了OpenCL軟件性能優(yōu)化的要點(diǎn)。
《OpenCL編程指南》是第一本針對OpenCL1.1規(guī)范的全面、權(quán)威的實踐指南,適合信息技術(shù)領(lǐng)域的研發(fā)人員和軟件架構(gòu)師閱讀參考。

作者簡介

書籍目錄

ForewordPrefacePart I The OpenCL 1-1 Language and API1.An Introduction to OpenCLWhat Is OpenCL,or Why You Need This BookOur Many-Core Future:Heterogeneous PlatformsSoftware in a Many-Core WorldConceptual Foundations of OpenCLPlatform ModelExecution ModelMemory ModelProgramming ModelsOpenCL and GraphicsThe Contents of OpenCLPlatform APIRuntime APIKernel Programming LanguageOpenCL SummaryThe Embedded ProfileLearning OpenCL2.HelloWorld:An OpenCL ExampleBuilding the ExamplesPrerequisitesMac OS X and Code:BlocksMicrosoft Windows and Visual StudioLinux and EclipseHelloWorld ExampleChoosing an OpenCL Platform and Creating a ContextChoosing a Device and Creating a Command-QueueCreating and Building a Program ObjectCreating Kernel and Memory ObjectsExecuting a KernelChecking for Errors in OpenCL3.Platforms,Contexts,and DevicesOpenCL PlatformsOpenCL DevicesOpenCL Contexts4.Programming with OpenCL CWriting a Data-Parallel Kernel Using OpenCL CScalar Data TypesThe half Data TypeVector Data TypesVector LiteralsVector ComponentsOther Data TypesDerived TypesImplicit Type ConversionsUsual Arithmetic ConversionsExplicit CastsExplicit ConversionsReinterpreting Data as Another TypeVector OperatorsArithmetic OperatorsRelational and Equality OperatorsBitwise OperatorsLogical OperatorsConditional OperatorShift OperatorsUnary OperatorsAssignment OperatorQualifiersFunction QualifiersKernel Attribute QualifiersAddress Space QualifiersAccess QualifiersType QualifiersKeywordsPreprocessor Directives and MacrosPragma DirectivesMacrosRestrictions5.OpenCL C Built-In FunctionsWork-Item FunctionsMath FunctionsFloating-Point PragmasFloating-Point ConstantsRelative Error as ulpsInteger FunctionsCommon FunctionsGeometric FunctionsRelational FunctionsVector Data Load and Store FunctionsSynchronization FunctionsAsync Copy and Prefetch FunctionsAtomic FunctionsMiscellaneous Vector FunctionsImage Read and Write FunctionsReading from an ImageSamplersDetermining the Border ColorWriting to an ImageQuerying Image Information6.Programs and KernelsProgram and Kernel Object OverviewProgram ObjectsCreating and Building ProgramsProgram Build OptionsCreating Programs from BinariesManaging and Querying ProgramsKernel ObjectsCreating Kernel Objects and Setting Kernel ArgumentsThread SafetyManaging and Querying Kernels7.Buffers and Sub-BuffersMemory Objects,Buffers,and Sub-Buffers OverviewCreating Buffers and Sub-BuffersQuerying Buffers and Sub-BuffersReading,Writing,and Copying Buffers and Sub-BuffersMapping Buffers and Sub-Buffers8.Images and SamplersImage and Sampler Object OverviewCreating Image ObjectsImage FormatsQuerying for Image SupportCreating Sampler ObjectsOpenCL C Functions for Working with ImagesTransferring Image Objects9.EventsCommands,Queues,and Events OverviewEvents and Command-QueuesEvent ObjectsGenerating Events on the HostEvents Impacting Execution on the HostUsing Events for ProfilingEvents Inside KernelsEvents from Outside OpenCL10.Interoperability with OpenGLOpenCL/OpenGL Sharing OverviewQuerying for the OpenGL Sharing ExtensionInitializing an OpenCL Context for OpenGL InteroperabilityCreating OpenCL Buffers from OpenGL BuffersCreating OpenCL Image Objects from OpenGL TexturesQuerying Information about OpenGL ObjectsSynchronization between OpenGL and OpenCL11.Interoperability with Direct3DDirect3D/OpenCL Sharing OverviewInitializing an OpenCL Context for Direct3D InteroperabilityCreating OpenCL Memory Objects from Direct3D Buffers and TexturesAcquiring and Releasing Direct3D Objects in OpenCLProcessing a Direct3D Texture in OpenCLProcessing D3D Vertex Data in OpenCL12.C++ Wrapper APIC++ Wrapper API OverviewC++ Wrapper API ExceptionsVector Add Example Using the C++ Wrapper APIChoosing an OpenCL Platform and Creating a ContextChoosing a Device and Creating a Command-QueueCreating and Building a Program ObjectCreating Kernel and Memory ObjectsExecuting the Vector Add Kernel13.OpenCL Embedded ProfileOpenCL Profile Overview64-Bit IntegersImagesBuilt-In Atomic FunctionsMandated Minimum Single-Precision Floating-Point CapabilitiesDetermining the Profile Supported by a Device in an OpenCL C ProgramPart II OpenCL 1-1 Case Studies14.Image HistogramComputing an Image HistogramParallelizing the Image HistogramAdditional Optimizations to the Parallel Image HistogramComputing Histograms with Half-Float or Float Values for Each Channel15.Sobel Edge Detection FilterWhat Is a Sobel Edge Detection Filter?Implementing the Sobel Filter as an OpenCL Kernel16.Parallelizing Dijkstra`s Single-Source Shortest-Path Graph AlgorithmGraph Data StructuresKernelsLeveraging Multiple Compute Devices17.Cloth Simulation in the Bullet Physics SDKAn Introduction to Cloth SimulationSimulating the Soft BodyExecuting the Simulation on the CPUChanges Necessary for Basic GPU ExecutionTwo-Layered BatchingOptimizing for SIMD Computation and Local MemoryAdding OpenGL Interoperation18.Simulating the Ocean with Fast Fourier TransformAn Overview of the Ocean ApplicationPhillips Spectrum GenerationAn OpenCL Discrete Fourier TransformDetermining 2D DecompositionUsing Local MemoryDetermining the Sub-Transform SizeDetermining the Work-Group SizeObtaining the Twiddle FactorsDetermining How Much Local Memory Is NeededAvoiding Local Memory Bank ConflictsUsing ImagesA Closer Look at the FFT KernelA Closer Look at the Transpose Kernel19.Optical FlowOptical Flow Problem OverviewSub-Pixel Accuracy with Hardware Linear InterpolationApplication of the Texture CacheUsing Local MemoryEarly Exit and Hardware SchedulingEfficient Visualization with OpenGL InteropPerformance20.Using OpenCL with PyOpenCLIntroducing PyOpenCLRunning the PyImageFilter2D ExamplePyImageFilter2D CodeContext and Command-Queue CreationLoading to an Image ObjectCreating and Building a ProgramSetting Kernel Arguments and Executing a KernelReading the Results21.Matrix Multiplication with OpenCLThe Basic Matrix Multiplication AlgorithmA Direct Translation into OpenCLIncreasing the Amount of Work per KernelOptimizing Memory Movement:Local MemoryPerformance Results and Optimizing the Original CPU Code22.Sparse Matrix-Vector MultiplicationSparse Matrix-Vector Multiplication(SpMV)AlgorithmDescription of This ImplementationTiled and Packetized Sparse Matrix RepresentationHeader StructureTiled and Packetized Sparse Matrix Design ConsiderationsOptional Team InformationTested Hardware Devices and ResultsAdditional Areas of OptimizationA.Summary of OpenCL 1.1The OpenCL Platform LayerContextsQuerying Platform Information and DevicesThe OpenCL RuntimeCommand-QueuesBuffer ObjectsCreate Buffer ObjectsRead,Write,and Copy Buffer ObjectsMap Buffer ObjectsManage Buffer ObjectsQuery Buffer ObjectsProgram ObjectsCreate Program ObjectsBuild Program ExecutableBuild OptionsQuery Program ObjectsUnload the OpenCL CompilerKernel and Event ObjectsCreate Kernel ObjectsKernel Arguments and Object QueriesExecute KernelsEvent ObjectsOut-of-Order Execution of Kernels and Memory Object CommandsProfiling OperationsFlush and FinishSupported Data TypesBuilt-In Scalar Data TypesBuilt-In Vector Data TypesOther Built-In Data TypesReserved Data TypesVector Component AddressingPreprocessor Directives and MacrosSpecify Type AttributesMath ConstantsWork-Item Built-In FunctionsInteger Built-In FunctionsCommon Built-In FunctionsMath Built-In FunctionsGeometric Built-In FunctionsRelational Built-In FunctionsVector Data Load/Store FunctionsAtomic FunctionsAsync Copies and Prefetch FunctionsSynchronization,Explicit Memory FenceMiscellaneous Vector Built-In FunctionsImage Read and Write Built-In FunctionsVector ComponentsVector Addressing EquivalenciesConversions and Type Casting ExamplesOperatorsAddress Space QualifiersFunction QualifiersImage ObjectsCreate Image ObjectsQuery List of Supported Image FormatsCopy between Image,Buffer ObjectsMap and Unmap Image ObjectsRead,Write,Copy Image ObjectsQuery Image ObjectsImage FormatsAccess QualifiersSampler ObjectsSampler Declaration FieldsOpenCL Device Architecture DiagramOpenCL/OpenGL Sharing APIsCL Buffer Objects>GL Buffer ObjectsCL Image Objects>GL TexturesCL Image Objects>GL RenderbuffersQuery InformationShare ObjectsCL Event Objects>GL Sync ObjectsCL Context>GL Context,SharegroupOpenCL/Direct3D 10 Sharing APIsIndex

章節(jié)摘錄

版權(quán)頁:   插圖:   The solution to this problem is for the program object to be built from source at runtime.The host program defines the devices within the context.Only at that point is it possible to know how to compile the program source code to create the code for the kernels.As for the source code itself,OpenCL is quite flexible about the form.In many cases,it is a regular string either statically defined in the host program,loaded from a file at runtime,or dynamically generated inside the host program. Our context now includes OpenCL devices and a program object from which the kernels are pulled for execution.Next we consider how the kernels interact with memory.The detailed memory model used by OpenCL will be described later.For the sake of our discussion of the context,we need to understand how the OpenCL memory works only at a high level.The crux of the matter is that on a heterogeneous platform,there are often multiple address spaces to manage.The host has the familiar address space expected on a CPU platform,but the devices may have a range of different memory architectures.To deal with this situation,OpenCL introduces the idea of memory objects.These are explicitly defined on the host and explicitly moved between the host and the OpenCL devices.This does put an extra burden on the programmer,but it lets us support a much wider range of platforms.We now understand the context within an OpenCL application.The context is the OpenCL devices,program objects,kernels,and memory objects that a kernel uses when it executes.Now we can move on to how the host program issues commands to the OpenCL devices. Command-Queues The interaction between the host and the OpenCL devices occurs through commands posted by the host to the command-queue.These commands wait in the command-queue until they execute on the OpenCL device.A command-queue is created by the host and attached to a single OpenCL device after the context has been defined.

編輯推薦

《國外信息科學(xué)與技術(shù)優(yōu)秀圖書系列:OpenCL編程指南(英文版)》針對最新的OpenCL1.1規(guī)范進(jìn)行編寫。由OpenCL技術(shù)領(lǐng)域的五大權(quán)威共同撰寫,內(nèi)容全面,涵蓋完整的規(guī)范。提供大量的用戶案例和代碼示例,詳盡完整的API和OpenCL C語言參考,具有很強(qiáng)的實用價值和參考價值。

圖書封面

圖書標(biāo)簽Tags

評論、評分、閱讀與下載


    OpenCL編程指南 PDF格式下載


用戶評論 (總計1條)

 
 

  •   如果英文好,要學(xué)并行編程的話,一定要看這本書啊。
 

250萬本中文圖書簡介、評論、評分,PDF格式免費(fèi)下載。 第一圖書網(wǎng) 手機(jī)版

京ICP備13047387號-7