M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 10 of 10
  • ItemOpen Access
    Design and Implementation of Low Power Superscalar Processors
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotawala, Fatema; Tatu, Aditya
    This thesis presents an 8-stage low power Superscalar processor. Since there has come an upper limit on the frequency that a single core inline processor can provide, to improve performance we need to exploit concepts like deeper pipelining and Instruction Level Parallelism (ILP). Parallel execution of a number of instructions gives better performance. But, to achieve low power at the same time along with higher performance is a challenge. The superscalar processor in this work is designed with 8 stages as a 2-way processor, which allows at a time 2 instructions to run and complete in parallel. The processor has been designed using Verilog HDL. Front-End analysis for the same has been done with the help of Cadence RTL Encounter Compiler. To achieve low power, clock gating has been applied. The library used for implementing the RTL is NLDM 45nm Nangate library. The frequency at which the designed processor worked fine is 200 MHz with the total power consumption found to be 51.5 mW.
  • ItemOpen Access
    High Performance Computing
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2016) Patel, Jaykumar; Bhatt, Amit
    Technology today has evolved from the Mainframe computers to laptops to small and smart handheld devices. These smart end devices are accompanied with extremely robust ARM 64-bit processors which are a perfect blend of power andperformance balancing. It becomes extremely important to utilize their capabilitiesto the maximum. The thesis is on High Performance Computing whichexploits the possibilities of performing computationally intensive tasks by clusterformation of easily available commodities off-the-shelf. The high performancecomputing clusters using Linux (Linaro) operating systems based Qualcomm embeddedboards, Linux desktop computers, andWindows desktop computers havebeen created. High Performance Linpack benchmarks are ran on the Linux basedclusters and the expected speedup in performance is obtained thereby allowingformation of a very cheap and high throughput computing cluster using devicesavailable around us and no need of specialized supercomputing environment.This idea of networking cluster formation is further extended to cloud. The samekind of computing cluster has been created on Amazon cloud services and hasbeen successfully tested for its throughput using HPL benchmarks. Now, the ideawas to integrate the local clusters with the clusters on cloud so that the live videoprocessing based computationally and power intensive tasks can be solved locallyfirst and then clusters on cloud can be used when the processing and batterypower of the local devices gets exhausted in real time. Thus, the idea of clusterformation can be extended to the field of IoT for solving small data issues like batterypower and processing power capacity in domains involving applications likeextensive video processing, modelling of complex mathematical equations, solvingcomplex input and computationally intensive machine learning algorithmsand many more applications
  • ItemOpen Access
    Implementation of different branch prediction schemes on FabScalar generated superscalar processor
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Patel, Jayesh; Bhatt, Amit
    Performance of modern pipeline processor depends on steady flow of useful instruction for processing. Branch instruction in the program disrupts the sequential flow of instruction by presenting multiple paths through which program may proceed. By predicting branch outcome early, branch predictor allows processor to continue fetching instruction from the predicted path. As the computer architecture try to squeeze more performance out of superscalar processor by increasing issue widths and pipeline depths. At that time penalties due to branch instruction continue to rise. Because of high branch miss prediction penalty, the branch prediction accuracy is a very important factor for superscalar processor. This study is concerned about exploring a FabScalar Tool for automatically generating superscalar cores of different pipeline widths, depths and sizes. This tool provides the RTL code of the desired superscalar core. A four issue wide superscalar core is generated using FabScalar tool. On this superscalar core the implementation and comparative study of three different dynamic branch predictions technique is done. These techniques are Bimodal Branch Predictor, Two-way Correlating Branch Predictor and Hybrid Branch Predictor.
  • ItemOpen Access
    Migration in initial and dynamic virtual machine placement algorithms
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Reddy, Narender A.; Chaudhary, Sanjay
    Cloud computing provides a computing platform for the users to meet their demands in an efficient way. Virtualization technologies are used in the clouds to aid the efficient usage of hardware. Virtual machines are utilized to satisfy the user needs and are placed on physical machines of the cloud for effective usage of hardware resources and electricity in the cloud. Optimizing the number of physical machines used helps in cutting down the power consumption by substantial amount. An optimal technique is to map virtual machines to physical machines such that the number of required physical machines is minimized. The virtual machine placement problem with the target of minimizing the total energy consumption by the running of physical machines, which is also an indication of increasing resource utilization and reducing cost of a data center. Due to the multiple dimensionality of physical resources, there always exists a waste of resources, which results from the imbalanced use of multi-dimensional resources. To characterize the multi-dimensional resource usage states of physical machines, a multi-dimensional normalized resource cube is presented. Based on this model, we propose a virtual machine placement algorithm with migration support which can balance the utilization of multi-dimensional resources, reduce the number of running physical machines and thus lower the energy consumption. We also evaluate our proposed algorithm via extensive simulations and experiments on Cloudsim. Experimental results show, over the long run, proposed algorithm can save as much as 15% energy than the other algorithms.
  • ItemOpen Access
    Implementation of high speed serial communication blocks
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2010) Parekh, Devang Tarunkumar; Dubey, Rahul
    Serial communication is widely being used from PCs to handheld mobile phones due to very less hardware, low cost, easier design process in comparison to parallel communication. For bit by bit, reliable transmission and receiving at the physical layer it is important for data sequences to have high transition density, low power spectral density, less bit error and reduced bandwidth. This thesis implements the universal serial bus 2.0 (USB 2.0) transceiver Macro cell interface (UTMI) in a generic form to use different low level signaling protocol blocks in other serial communication standards. The code written is synthesizable and verified for correct functionality. The HDL code is a state machine (Mealy machine) implementation from the specification of UTMI. The challenging part of the work was to implement clock and data recovery block as it involved a lot of engineering concepts like control theory, digital electronics and analog circuits. The work presents intricacies in the design of PLL for recovery of clock and data. UTMI helps in faster development of ASIC and provides an abstraction layer for the peripheral developers who are not involved in low level details of physical layer. Finally the results for UTMI implementation are presented.
  • ItemOpen Access
    Realization of FPGA based digital controller
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2010) Patel, Amit; Dubey, Rahul
    Field Programmable Gate Array (FPGA) can be used to enhance the efficiency and the flexibility of digital controller. FPGA implementation of digital controllers leads to real time realizations with small size and high speed. Also it offers advantages such as complex functionality, fast computation, and low power consumption for high volume production. This thesis presents realization of FPGA based speed control of brushless dc (BLDC) motor - a real time application. The construction and the operation of the BLDC motor are described. The different control strategies for speed controller and digital pulse width modulation (PWM) control technique are implemented and tested on BLDC motor. Also their performance is evaluated. Proportional Integral (PI) controller and Fuzzy Logic Controller (FLC) are implemented in FPGA as a digital controller. The PI controller is governed by the values of proportional gain and integral gain, while FLC behaves in much similar way as human controls the system. Logics of both controllers: PI controller and fuzzy logic controller are written in High Description Language (HDL). The performance of the PI controller is better than the fuzzy logic controller in case of the complete known plant.
  • ItemOpen Access
    Design of a high speed I/O buffer
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2008) Rathore, Akhil; Parikh, Chetan D.
    In high speed serial transmission of data, output buffer creates the bottleneck. Current Mode Logic (CML) buffers have gained wide acceptance in most high speed serial interfaces as they reach speed of the order of Gbp/s. CML buffers achieves high speed due to low output voltage swing which reduces transition time. Presently CML buffers are designed with differential architecture and uses different bandwidth extension technique (inductive peaking, negative miller capacitance, active feedback) to increase the speed. At high frequency, input output coupling limits the bandwidth due to miller effect because of gate to drain capacitance. The proposed design incorporates the architecture which reduces miller effect, hence achieves high bandwidth. In this topology a source follower drives a common-gate stage which is an example of ‘unilateral’ amplifier, that is, one in which signal can flow only in one way over large bandwidths. It reduces unintended and undesired feedback. This CML buffer is designed for OC-192/STM-64 application to be used in limiting amplifier which is a critical block in optical system. OC-192/STM-64 works around 10Gbps.
  • ItemOpen Access
    Improvement of tagged architecture for preventing software vulnerabilities
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2008) Shah, Tejaskumar; Mathuria, Anish M.
    In spite of the many defense techniques, software vulnerabilities like buffer overflow, format string vulnerability and integer vulnerability is still exploited by attackers. These software vulnerabilities arise due to programming mistakes which allows security bugs to be exploited. Buffer overflow occurs when buffer is given more data than the capacity of it. Format string vulnerability arises when data supplied by attacker is passed to formatting functions as format string argument. Integer vulnerability occurs when program evaluates an integer to unexpected value due to integer overflows, underflows, truncation errors or signed conversion errors. The hardware based solution called tagged architecture protects a system against mentioned vulnerabilities. In tagged architecture, each memory byte is appended with one tag bit to mark data that comes from I/O. Whenever I/O supplied data is used to transfer control of a system or to access memory, an alert is raised and program is terminated. This thesis proposes a weakness of tagged architecture by finding false positives and false negatives on it. It also proposes the improvements to the tagged architecture to avoid found false positives on it. The prototype implementation of improved tagged architecture is done in SimpleScalar simulator. The SimpleScalar simulator is a architectural simulator. The security evaluation is done for tagged architecture and improved tagged architecture through benchmarks and synthetic vulnerable programs.
  • ItemOpen Access
    Design methodology for architecting application specific instruction set processor
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2007) Desai, Meghana; Dubey, Rahul
    Application Specific Instruction-set Processors (ASIP), also referred as extensible processors, represent the state-of-the-art microprocessor architecture. ASIPs are practically leading towards the realization of System-on-a-Chip (SoC) concept; as processor, customised for an application, can be easily integrated in a SoC as pre-designed and pre-verified soft RTL block. Most significant and challenging part for these flexible or programmable processors is the design methodology. The challenge lies in providing a simple configurable design space such that the outcome is optimised, efficient and customised application specific processor hardware, with very short design cycle time. The bottle neck for a processor is chiefly the data path design, as it has computational intensive functional units which add to the major portion of hardware area along with timing. In case of ASIP as well, data path modification is to be achieved as per the requirements. Current electronic design automation (EDA) tools are intelligent and if exploited well can actually help in providing various optimizations in the design. The implemented design approach is based on these aspects of selection of accurate data path elements along with distributed control path and exploiting the inbuilt functionality of EDA tools for generating user defined architecture. In this project a non-pipelined as well as five stage pipelined processor fabrics are implemented with configurable parameters. A library of basic arithmetic functional units is created from which a component of desired characteristic is selected and integrated in the data path. Synthesis of modified processor core is performed with a set of constraints to achieve required trade off between area, power and timing. Multi-supply voltage feature of the synthesis tool is exploited to meet the timing closure of the generated processor core.
  • ItemOpen Access
    Low power microprocessor design
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2007) Bhatt, Vishal; Dubey, Rahul
    This research work tries to reduce the power consumption of a processor with signal processing features. For low power design, focus is on developing ‘Low power synthesizable Register File’, as the initial study shows that there is potential for significant benefit by doing this. Two techniques are proposed and implemented in this work, (1) Compiler Driven Register Access (CDRA) (2) Register Windowing. Here, Register Windowing is an extension to an earlier technique called ‘Register Isolation’. Benchmarks used for evaluating design in terms of power consumption and performance, simulate conditions encountered by the processor in control and DSP applications. After applying various low power techniques, average power reduction obtained across benchmarks is 1.5% and the maximum power reduction obtained is 2.6% when compared to Base Processor which is a customized version of MIPS architecture with signal processing capability.