NVIDIA, Santa Clara, California

Reflex Latency Analyzer Engineering Intern

Aug 2021 - Dec 2021

  • Worked as part of the GSYNC Reflex Latency Analyser team as an FPGA/firmware engineer responsible for real-time performance metrics of gaming monitors
  • Reduced embedded CPU processing load to detect mouse clicks and keyboard taps by adding a hardware USB transaction filter that removes irrelevant USB packets.
  • Gained >2x speedup per transaction by designing RTL and the corresponding firmware changes for snooping USB traffic at packet level for recording interesting events such as keyboard taps or mouse clicks.
  • Identified performance critical areas that can be improved using debugging and packet sniffing tools like Signal Tap and Wireshark.

Lawrence Livermore National Lab, Livermore, California

Computational Engineering Division Intern | Dr. Maya Gokhale

May 2019 – Aug 2019

  • Built a dynamic delay unit emulator called “Logic in-Memory Emulator (LiME)” on a Zynq Ultrascale+ MPSoC FPGA (Fidus Sidewinder or ZCU102) that allowed engineers in the U.S. Department of Energy to model(emulate) current and future memory systems as well as near memory accelerators
  • Got only 20x slower speeds than real time (orders of magnitude faster than typical simulators) execution

Center for Research in Extreme Scale Technologies , IU Bloomington, Indiana

Research Assistant | Prof. Thomas Sterling

Jun 2017 – Aug 2018

  • Engineered a test infrastructure for a cycle-accurate FPGA based simulator for a non-Von Neumann general-purpose computer called Continuum Computer Architecture
  • Experimented with various interfaces for host to programmable logic (PCIe), processing system to programmable logic (AXI buses), and programmable logic to programmable logic (QSFP28), to incorporate it in the testing infrastructure for maximum performance

Kelley School of Business & Dept. of Linguistics, Indiana University

Research Intern |  Prof. Damir Cavar

Nov 2016 – May 2017

  • Developed and open sourced Free Linguistic Environment (FLE), a grammar engineering platform for Lexical Functional Grammar (LFG) framework, on a team of 4 engineers and 2 data scientists
  • Built the “Guesser Module” from scratch which predicted the typographical errors or missing words along with their semantic meaning by using Deep Semantic parsing considering the position of the words, bigram models along with the probabilistic models of machine learning like Tf-idf, regression, bag of words, and k-nearest neighbors

Tata Consultancy Services, India

Systems Engineer  

Feb 2013 – Jun 2016

  • Designed, developed, and maintained the data warehouses for multiple domains like payroll, treasury, and oil and gas that averaged 5TB in total size for General Electric US
  • Improved query run times for multi-dimensional analysis by 12% on average and 10% faster view generation time
  • Automated ETL processes using Informatica PowerCenter Client Tool and shell scripts, revamped the data warehouse, and automated the process to transform large volumes of data while handling major responsibilities such as client interaction, source control, integration testing and server deployment

Teaching Experience

Instructor | Department of Intelligent Systems Engineering, Indiana University – Bloomington, IN

June 2020 - Present

  • Designed and taught a 4 credit, 100% online course on Software Systems Engineering
  • Introduced tools like gdb, valgrind, gprof, git, vim, and emacs

Associate Instructor | Department of Intelligent Systems Engineering, Indiana University – Bloomington, IN

Aug 2018 - Present

  • Automated grading using autograder in the three courses saving ~150 hours/semester
  • Taught 3 courses - Advanced Operating Systems, Digital Design with FPGAs, and Computer Networks

Digital Design with FPGAs (Prof. Andrew Lukefahr) - Class Size: ~25 students

  • Introduced FPGA development using verilog in Vivado focussed on hardware acceleration of software-based applications
  • Tutored a hardware/software co-design based project for accelerating Machine Learning reducing clock cycles for dot products from 2210 cycles to 130 cycles

Advanced Operating Systems(Prof. Martin Swany) - Class Size: ~100 students

  • Guided students through building an operating system from ground up using Xinu operating system
  • Introduced and helped students implement virtual memory, File system, scheduling, and synchronization to enhance a bare bone operating system on an embedded device called Beagle-bone Black

Computer Networks (Prof. Martin Swany) - Class Size: ~100 students

  • Taught networking protocols (UDP, RUDP, and TCP)
  • Helped students to create a project using OpenFlow and a software defined network controller framework called Ryu that manages routes, learns routes based on the network traffic and performs load balancing