Doctoral Thesis - Reducing overheads in a novel non-von Neumann architecture based computer (C, C++, Python, Verilog)

  • Developing a proof of concept behavioral emulator in python and Structural Simulation Toolkit along with a cycle accurate FPGA-based  for active memory architecture to run experiments for reducing architectural overheads prevalent in conventional hardware for general-purpose computing, specialized for dynamic graph processing for applications in the field of AI and ML, n-body simulations, and Adaptive Mesh Refinement
  • Enhancing performance by reducing starvation, latency, overheads, and contention via a ParalleX based execution model, hardware mechanisms for global namespace translations, adaptive routing and reordering of a message based runtime system, and graph primitive operations
  • Projected an instance of CCA to yield 600x peak performance improvement, 300x increase in memory bandwidth, and 95% reduction of physical footprint compared to Sunway TaihuLight

High Performance Computing (OpenMP, MPI, C/C++); Master’s Thesis

  • Reduced time to solution by 90% with a message driven runtime system (like Charm++) or Graph500 by conducting a comparative analysis on scaling results for graph processing algorithms like single source shortest path (SSSP) algorithm
  • Slashed 39% execution time on graph processing by creating a parallel variant of a graph algorithm (Dijkstra’s algorithm) on shared memory processors using OpenMP, MPI and parallel boost graph library on a graph size of 100GB

Natural Language Processing (C++11, Python, NLTK, SciKit, Alexa Skills Kit, SpaCy, coreNLP, Stardog, Neo4J)

  • Developed a speech aided NLP based artificial intelligence bot with an end-to-end response time of ~300ms capable of storing information from simple English sentences and respond the questions with keyword search about the information already stored in the system using Stardog and Neo4J
  • Built a genre detection tool with 91% accuracy for text classification (as sci-fi, history, physics, art etc.). Topic modelling was done using Tf-Idf and K-nearest neighbors (KNN) for classification. Other techniques including dependency parsing, bigram models, deep learning constructs like CNN and RNN, and ensemble approach with multiple weak-voting classifiers, were used but performed poorly in terms of accuracy

Advanced Operating Systems: Embedded OS Development in C (C, XINU, LINUX)

  • Implemented virtual memory and a lightweight file system to enhance security and reliability of the memory management unit in the XINU operating system on an embedded SoC (BeagleBone Black) which is used in handheld gaming consoles and IoT devices
  • Engineered process synchronization mechanisms using semaphores, promises & futures