Programming Languages and Parallelism
The N-body problem in Vector Pascal, talk given at the workshop for Phase 2 of the SICSA multicore challenge.
Automatic Vectorising Compilation A talk given at Microsoft Labs (Jan 2011) describing the approach we use to data parallel compilation at Glasgow University. It covers compilation from Fortran 90 and Vector Pascal to multi-core processors both homogenous ones and heterogeneous ones like the Cell.
SICSA multicore challenge page.Describes my entry into phase 1 of the challenge, a parallel programme to prepare concordances of text.
The SCC and the SICSA Multi-core Challenge This describes our entries into the first and second round of the SICSA multicore chanllenge using a variety of parallel architectures including the 64 core SCC from Intel, and using two of our languages Lino and Vector Pascal. The two problems were an N-body simulation and preparing a Bible concordance.
SICSA Benchmarks in C, talk at the SICSA Multi-core Challenge Workshop December 2010 reporting results of the challenge both on the SCC and other machines.
Challenging Multi-cores — talk given at the Scottish Programming Languages Society in Oct 2010 on Lino and the Intel SCC processor.
Genetics Algorithms and Instruction-sets The paper draws a close analogy between the evolution of organisms and the evolution of machine languages. It then looks at the possibility of using genetic algorithms to optimize the automatic construction of code generators. Experimental evidence is provided that the use of such algorithms can improve the quality of automatically constructed code generators.
Lino: a tiling language for arrays of processors Lino is a language for tiling large arrays of processors, in particular for multi-core. Lino is oriented to the coordination and communication aspects of multi-processing, and is otherwise implementation neutral, thus naturally facilitating the composition of large systems from heterogeneous software components. The need for Lino is motivated, and Linos design and implementation are surveyed.
SIMD and Multi-core Parallelisation in an Imperative Language. Paper for the Advances In Programming Languages 2009 SICSA summer school. Describes the use of pure functions in an imperative language as a means to allow multi-core plus SIMD parallelism
The Abstraction Mechanisms of Vector Pascal, Vector Pascal is a language designed to support the elegant and efficient expression of algorithms using the SIMD model of computation. It imports into Pascal abstraction mechanisms derived from functional languages having their origins in APL. In particular, t extends all operators to work on vectors of data. The type system is extended to handle pixels and dimensional analysis. Code generation is via the ILCG system that allows retargeting to multiple different SIMD instruction sets based on formal description of the instruction set semantics.
SIMD and other optimizations in Vector Pascal Despite the widespread adoption of parallel operations in contemporary CPU designs, their use has been restricted by a lack of appropriate programming language abstractions
and development environments. Vector Pascal is a language designed to enable the elegant and efficient expression of SIMD algorithms. It imports into Pascal abstraction mechanisms derived from functional languages, in turn having their origins in APL. In particular, it extends all operators to work on vectors of data. Arithmetic is extended to support multi-media datatypes. Code generation is via the ILCG system that allows retargeting to multiple different SIMD instruction sets based on formalized descriptions of the instruction set semantics. We discuss some of the specific program transformations required for the efficient generation of SIMD Pascal code.
Automatic SIMD/MIMD compilation of tensor expressions A talk given at the tensor seminar in Edinburgh in September 2009
Array Languages and the Challenge of Modern Computer Architecture, this reviews recent trends towards greater parallelism in processor design and argues that only array languages can meet the challenges these pose.
Vector Pascal – an array language Paper for the APL2002 conference.
Vector Pascal. Manual Vector Pascal is a language designed to support the elegant and efficient expression of algorithms using the SIMD model of computation. It imports into Pascal features derived from the functional languages APL and J, in particular, the extension of all operators to work on vectors of data. The type system is extended to handle dimensional analysis. Code generation is via the ILCG system that allows retargeting to multiple different SIMD instruction sets based on the formal description of the instruction set semantics. For more details: see the SourceForge site of the system.
Final report: merging Single Assignment C and Vector Pascal technologies. Report on work done on EPSRC grant EP/D/032423/1.
Efficient compilation of array expressions Paper in the APL Quote Quad journal discussing how to compile APL like languages to vector machines.
Orthogonal Parallel Processing, a paper by Greg Michaelson and I on the technology used in Vector Pascal.
Array programming in Pascal A review of the previous array Pascals leads on to a description of the Glasgow Pascal compiler. The compiler is an ISO-Pascal superset with semantic extensions to translate data parallel statements to run on multiple SIMD cores. An appendix is given which includes demonstrations of the tool.
Direct compilation of high level languages to multi media instruction sets. Covers use of MMX, 3DNOW and SIMD instructions to obtain parallelism. Also available as HTML
A Process Algebra Approach to Grid Parallelism This was a research proposal developed by myself and others at the University of Glasgow to handle high degrees of parallelism on the grid.
The PGP grid Project. A report on JPI and on the parallelization of stereo matching.
The Jpi interface. A description of a proposed interface to GRID parallelism based on Milner’s Pi-Calculus.
Quantum Relational Databases The approach given by Grover can be generalised to set an upper complexity limit to the basic operations of relational databases on a quantum computer. Except in special cases where indices can be used on a classical machine, the quantum upper complexity limit is lower than the classical one.
No Mysteries: Davis and the myth of Hypercomputation, The topic of hypercomputation is arcane, to say the least. Why then have we spent considerable intellectual effort on it this last decade?
We argue against it because it is an unscientific reversion to idealist models of mathematics. By itself, that would be an issue only for computational theorists, but it becomes a philosophical and political matter when authors like Marciszewiski and Bringsjord start to use ideas of hypercomputation as a basis for the defence of free market capitalism. We have specifically argued against the first author already. At this point we do not go into Bringsjord’s economic arguments, we concentrate on his attempt to establish the concept of hypercomputation – a sort of mystical calculation that human souls and the divine free market are alleged to be able to perform.
Materialism and delayed choice quantum eraser experiments, I have been producing a series of videos explaining materialism. Most of these have been accepted without controversy, but predictably enough, the last one, which was on quantum phenomena and materialism raised a number of critical comments. The gist of my argument had been that the dominant Copenhagen interpretation of QM simply recapitulated the instrumentalist prejudices of the physicists of the early 20th century. The interpretation that, for instance, an electron or photon has no position until it is measured, merely restates the founding assumption of instrumentalism – that science is about relations between subjective sense data perceived by the scientist. This was extended to say that it is about relations between instrument readings that the scientist perceives. This presents an argument against that standpoint in the context of an experiment that is usually taken as justifying it.
Marx, Darwin and Teleology, a discussion of the displacement of teleology by these materialists.
Quantum Parallelism and Scientific Realism, an article discussing how the advent of quantum computers will put into further question the Copenhagen Interpretation of QM.
Discoursing the multiverse discusses why arguments about multiverses may a) be relevant for the materialist theory of history, b) crucial to future technology.
In that, I argued that advances in quantum computing make the Everett interpretation of quantum mechanics more plausible. In part, this is unsurprising since the whole field was set off by Deutsch a firm supporter of Everett in his groundbreaking Royal Society paper, back in 1984.
On Althusser’s Philosophy of the Encounter, The article reviews Althusser’s Philosophy of the Encounter, examining, in turn, the problem of the epistemological Break and the idea of materialisme aleatoire. It looks at Althusser’s critique of the concept of commodity fetishism and suggests a possible response. It goes on to situate the materialisme aleatoire in the context of the history of atomism with particular reference to the work of Boltzmann. It provides a possible technique of both rejecting teleology
and retaining the arrow of time.
What is so Real about the Reals — talk about the real numbers given at the Computer Science Dept April 2010
Non-classical computing: feasible versus infeasible.
Physics sets certain limits on what is and is not computable. These limits are very far from having been reached by current technologies. Whilst proposals for hypercomputation are almost certainly infeasible, there are a number of non-classical approaches that do hold considerable promise. There is a range of possible architectures that could be implemented on silicon that are distinctly different from the von Neumann model. Beyond this, quantum simulators, which are the quantum equivalent of analogue computers, may be constructible in the near future. Slides to accompany the talk.
A Hardware Relaxation Paradigm for Solving NP-Hard Problems Digital circuits with feedback loops can solve some instances of NP-hard problems
by relaxation: the circuit will either oscillate or settle down to a stable state that represents a solution to the problem instance. This approach differs from using hardware accelerators to speed up the execution of deterministic algorithms, as it exploits stabilisation properties of circuits with feedback, and it allows a variety of hardware techniques that do not have counterparts in software. A feedback circuit that solves many instances of Boolean satisfiability problems is described, with experimental results from a preliminary simulation using a hardware accelerator.
To Infinity and Beyond, joint paper with Greg Michaelson and Lewis Mackenzie, Many attempts to transcend the fundamental limitations to computability implied by the Halting Problem for Turing Machines depend on the use of forms of hypercomputation that draw on notions of infinite or continuous, as opposed to bounded or discrete, computation. Thus, such schemes may include the deployment of actualised rather than potential infinities of physical resources, or of physical representations of real numbers to arbitrary precision. Here, we argue that such bases for hypercomputation are not materially realisable and so cannot constitute new forms of effective calculability. A slightly amended version of this has now appeared in the journal Theoretical Computer Science A.
Cantor diagonalisation and planning, by Greg Michaelson, Allin Cottrell and Paul Cockshott
Murphy (2006) recently argued that one could use the diagonal argument of the number theorist Cantor to elucidate issues that arose in the socialist calculation debate of the 1930s. We will here argue that Murphy’s argument has certain problems, both at the number-theoretic level and from the standpoint of economic realism.
Are there new models of computation, an essay with Greg Michaelson arguing against recent proposals for super Turing computation. A slightly amended version of this has now appeared in the Computer Journal.
Boettke Syntax and the Turing Test, a reply by Greg Michaelson, Allin Cottrell and I to a paper on Methodology by Boettke which he published in Cahiers D’Epistemolgie.
Information, Work, and Meaning an introduction to information theory by me and Greg Michaelson, it relationship to thermodynamics and to classical political economy with particular emphasis on how it applies to industrial mass production.
The meaning of work: reflections on Watt, Marx, Turing and a Bee The paper examines the concept of work in Watt and Marx. It asks what is it about the nature of reality that makes work possible. It approaches this via a digression on Marx’s fable of the Architect and the Bee and goes on to answer it in terms of the intellectual history of thermodynamics.
Von Neumann to Bloom: useful techniques with vector spaces, a seminar for post-grad students showing how use of vector spaces spans image processing, information retrieval, and economic growth theory.
2D Image Convolution using Three Parallel Programming Models on the Xeon Phi Image convolution is widely used for sharpening, blurring and edge detection. In this paper, we review two common algorithms for convolving a 2D image by a separable kernel (filter). Afteroptimizingg the naive codes using loop unrolling and SIMDvectorizationn, we choose the algorithm with better performance as the baseline forparallelizationn. We then compare the parallel performance of theoptimizedd code using OpenMP, OpenCL and GPRM implementations on the Intel Xeon Phi. We also measure the effects ofoptimizationn techniques and demonstrate how they affect both sequential and parallel performance. Apart from comparing the code complexity as well as the performance of the chosen parallel programming models, we investigate the impact of parallelization technique, task agglomeration in GPRM.
Vector Quantisation Based Image Enhancement, We present a new algorithm for rescaling images inspired by fractal coding. It uses a statistical model of the relationship between detail at different scales of the image to interpolate detail at one octave above the highest spatial frequency in the original image. We compare it with Bspline and bilinear interpolation techniques and show that it yields a sharper looking rescaled image.
Parallel Image Processing – a tutorial on using Vector Pascal to perform basic image processing operations in a data-parallel manner.
Applying the Grid to 3D capture technology Paper on the use of networks of machines to process 3D TV information. We used this with our 3D TV studio.
The use of Image Algebra Paper that addresses the use of Image Algebra as a foundation for image processing in the context of the IP -Racine project. Provides a general introduction to Image Algebra for those unfamiliar to it.
CONFOCAL MICROSCOPIC IMAGE SEQUENCE COMPRESSION USING VECTOR QUANTIZATION AND 3D PYRAMIDS. Paper on the technique developed by Yegang Tao and I to efficiently encode microscope images for web distribution.
Experimental 3D TV studio, appeared in IEE Computer Vision and Graphics, 2003. It describes a 3D television studio and using 24 synchronized video cameras that we developed at Glasgow.
MICROSCOPIC VOLUMETRIC IMAGE DATA COMPRESSION USING VECTORQUANTIZATION AND 3D PYRAMIDS The 3D pyramid compressor project at the University of Glasgow has developed a compressor for images obtained from CLSM device. The proposed method using a combination of image pyramid coder and vector quantization techniques has good performance at compressing confocal volume image data. An experiment was conducted on several kinds of CLSM data using the presented compressor compared to other well-known volume data compressors, such as MPEG. The results showed that the 3D pyramid compressor gave higher subjective and objective image quality and better minutia preservation on reconstructed images at the same compression ratio.
A Resolution Independent Image Representation for Digital Cinema, standard approaches to image and movie representation are proving inadequate in the context of digital cinema. In particular, the use of the Discrete Cosine Transform (DCT), as found in popular video formats, results in visual artifacts that are unacceptable to the movie industry. This describes a novel type of image and movie representation designed to allow resolution-independent manipulation and achieve perceptually lossless compression.
MERGING MICROSCOPY COMPRESSION TECHNIQUES WITH CONTINUOUS IMAGE REPRESENTATION, looks at how techniques devised to compress confocal microscopy stacks could be used to compress video.
Fast Fractal Transform Method for Data Compression : rr-156-94 Describes an improved algorithm that speeds up fractal image compression by 3 orders of magnitude. Allows fractal compression of 512×512 24 bit colour images in less than 20 seconds. Published in Dr Dobbs Journal #243
Algorithm for the fast construction of codebooks for hierarchical vector quantization We present a recursive algorithm designed to construct quantization tables and codebooks for the hierarchical vector quantization of images. The algorithm is computationally inexpensive and yields high quality codebooks. hvq/codebooks.html
A review of the image compression work that I was involved in at the University of Strathclyde.
Persistence and databases
DPMI Persistent Object Store (DPOS) A design for the underlying persistent memory system that was designed for the Hibase database system.
Design for new Hibase, Hibase was a high performance compressed relational database that I and others patented whilst at the University of Strathclyde. It was later spun out to commercial use. This was an improved design a few years later.
An approach to persistent programming This paper presents the identification of a new programming language concept and reports our initial investigations of its utility. The concept is to identify persistence as an orthogonal property of data, independent of data type and the way in which data is manipulated. This is expressed by the principle that all data objects, independent of their data type, should have the same rights to persistence or transience. We expect to achieve persistent independent programming, so that the same code is applicable to data of any persistence. We have designed a language PS-algol by using these ideas and constructed a number of implementations. The experience gained is reported here, as a step in the task of achieving languages with proper accommodation for persistent programming.
CMS—A chunk management system The chunk manager provides simple transaction management, concurrency control and allocates, administers and retrieves apparently contiguous chunks of data of arbitrary and varying size on disc. This system is designed to permit students and research workers to rapidly assemble and test their own DBMS, supporting any data model.
Algorithms for a persistent heap PS‐Algol is a dialect of Algol for the programming of problems that would normally require a database management system. It supports a persistent heap, and an associative store; it has embedded within the language features to support tasks normally carried out by filing systems or database management systems.
PS-Algol: A language for persistent programming PS-algol is the first language in a family that introduces the concept of persistence as a property of data. The persistence of data is the time extent over which the data may be used. This paper introduces the general design principles for persistent programming languages and describes our first attempt, PS-algol.
Orthogonal persistence: an abstract representation of persistent storage in Algol-like languages This was my PhD thesis on the design of the language PS-algol with persistent virtual memory.
Addressing mechanisms and persistent programming The question of addressing mechanisms is at the basis of all implementations of persistent programming. What persistent programming enables us to do is store complex data structures by virtue of being able to preserve referential structure. What I want to look at is what different forms of addressing can be used to support persistence.
Layered implementations of persistent object stores The potential of persistent object stores as a replacement for filestores is identified. One of the obstacles to this happening is the lack of standardized object representations. The problems of standardization are discussed and a 7-layer model proposed. A case is made out for using universal persistent identifiers to refer to persistent objects.
Implementing 128 Bit Persistent Addresses on 80×86 Processors The Intel architecture for the 80×86 series machines lends itself well to the implementation of persistent object-oriented languages. They are also by a wide margin the most commonly used CPUs in the world. Mass production has driven down the costs of machines using these processors and they thus make an appealing platform for experimentation. They have a model of the store which corresponds well to the 7 layer model of persistent memory proposed by the authors: with physical, paged and segmented memory interfaces.
|Overview of the Space Machine a FPGA computer I designed to execute cellular automata. The machine was extensible to form a large 2D array using the blue link ports that would connect to other boards.||Closeup showing the INMOS Transputer that acted as the controll and configuration processor and the link hardware that could be used to feed information into the FPGA array from the transputer.|
Realising massively concurrent systems on the Space Machine : hdv-29-93
A description of the Space Machine and its use.
Use of high speed cellular automata to simulate road traffic : hdv-27-93 A presentation of how the space machine can be applied to road traffic simulation.
Building a microcomputer with associative virtual memory, this describes the Poppy a 64 bit address space workstation I designed and built at Glasgow University in 1985 with funding from Acorn to support persistent programming languages. During the 1980s I worked on persistent programming – the idea of having a programming language in which the variables and heap are preserved over successive invocations of the programme. The machine had a memory management unit specifically tailored to persistent object oriented programming, and mapped active pages to battery backed ram chips for transactional reliability.
At the time I saw this as part of a development programme to create networked machines suitable for continent wide economic planning. The machines were designed to have a distributed address space spread over a network. The Poppy was seen as a half scale prototype of a future 128 bit machine.
|Overview of the whole Poppy computer||Motherboard, DRAM bank at right, battery backed SRAM at back|
|The NS32000 microprocessor and associative string match processor||Closeup of the stable cache, with a 10 year retention period. There was provision for fitting more than 128K of stable cache.|
Design of POMP—a Persistent Object Management Processor Pomp is a single board persistent object management processor designed to fit into a Sun Computer. When attached to a SUN Vme Bus computer it will allow that machine to create and manipulate a graph of persistent objects. These objects will exist over the lifetime of the hardware, and beyond that provided that other machines capable of reading the archive media exist.
A low-cost text retrieval machine The paper introduces the design and implementation of a text retrieval hardware unit. After surveying a number of hardware text retrieval packages, it proposes a linguistic approach, in which the retrieval is treated as a regular expression which can be recognised by a finite state machine. The machine was built in prototype form but never put into mass production.
Parsing instruction set computers A class of processors is introduced with instruction sets that are specialized for parsing. The types of instructions required to parse regular expressions and context-free grammars are identified. Possible implementations of parser instruction set computers in the implementation of transducers are discussed. Two experimental parsing instruction set computers intended for text retrieval that I designed and built when working at Memex are described.
A Follow On report on the Gala Motherboards, more analysis of why the Gala motherboards performed the way they did.
IBM Case with respect to DMA controllers. Continued expert reports on IBM motherboards.
Defence against Vault 7 attacks, a paper explaining how microprocessors with permutable instruction sets could be used to protect against code injection and other virus attacks.
A simple compiler course – courses given at the University of Strathclyde base d on my compiler writer’s toolbox. Gives techniques for simple compilation into assembler for the x86.