“More is different.”

P. W. Anderson · Science, 1972

BITS Pilani, K. K. Birla Goa Campus · EEE (Hons.) · Physics minor

Devadarsh A Nair

Undergraduate electronics and electrical engineering student building toward research in memory-centric AI hardware: the physics of memory devices, the mixed-signal circuits that read and write them, and the architectures that decide how data moves.

projects documented
10
ongoing
2
research areas
7
i · MOSFET transfer: DIBL shifts VT with drain bias
ii · op-amp open-loop gain and phase
iii · training vs. validation loss
iv · quantum Hall: σxy plateaus, ρxx oscillations

01 · About

A Short Introduction

I am a third-year B.E. (Hons.) Electronics and Electrical Engineering student at BITS Pilani, K. K. Birla Goa Campus, with a minor in Physics.

Most of my coursework and project work sits where electronics meets physics: semiconductor devices, analog circuits, signals, and quantum mechanics. Outside the curriculum I am a research intern at CircuitEvolve, building the LLM-guided evolutionary optimizer at the core of its analog design-automation flow and leading a research section on yield-informed circuit reasoning. At BITSilicon I lead a 12-member team designing an RV32I five-stage pipelined RISC-V processor for a tape-out competition.

I am preparing for research internships in semiconductors and AI hardware and, longer term, graduate research in memory-centric AI hardware: emerging memory devices, the mixed-signal circuits that read and write them, and the architectures they serve. The premise is that data movement, not arithmetic, is becoming the binding constraint on AI systems, and that the layers deciding it run from condensed matter physics up through computer architecture. This site documents the projects, notes, and reading that connect those layers.

02 · Research direction

Three Layers of the Same Stack

Everything on this page sits on one of three layers. The thread through them: in AI systems, moving data now costs more than computing with it, and the fix spans all three.

  1. LAYER 01

    Condensed matter & device physics

    How electrons behave in solids (band structure, transport, tunneling, defects) and how that physics becomes working devices: transistor electrostatics and scaling, and the switching mechanisms of charge, ions, dipoles, and spins that emerging memories are built from.

  2. LAYER 02

    Circuit-level behavior

    How devices compose into amplifiers, comparators, and the periphery every memory lives or dies by: sense amplifiers, references, and write paths, with specs verified by testbenches and scoreboards, not by staring at waveforms.

  3. LAYER 03

    Architecture & the AI workload

    Where the data actually moves: pipelines, caches, memory hierarchies, and the accelerator workloads that stress them. Machine learning enters as an instrument for the layers below (optimizers and surrogates for design), not as the destination.

The formal spine is still the Physics minor: Quantum Mechanics and Semiconductor Devices so far, statistical mechanics and solid state ahead. The long-term aim is graduate research in memory-centric AI hardware: physics-aware memory devices and the mixed-signal and architectural systems built on them. The physics is not a detour: memory devices are applied condensed matter, and the goal is to be able to follow a bit from switching mechanism to sense amplifier to workload.

  • Semiconductor device physics
  • Emerging memory devices
  • Condensed matter physics
  • Analog & mixed-signal circuits
  • Memory systems & computer architecture
  • AI-assisted EDA

03 · Vision

Problems I Want to Work On

A working map for the next several years. One sentence holds it together:

Data movement, not arithmetic, is the bottleneck in AI computing. Work on the devices, circuits, and architectures that reduce it.
Primary direction

Emerging memory devices for AI hardware

Resistive, ferroelectric, phase-change, and magnetic memories each turn a piece of condensed matter physics (ion motion, polarization, phase transitions, spin) into a stored bit. I want to work on that translation: switching mechanism to device model to array behavior, with variability and reliability kept honest. My entry points are the device-physics coursework and the Sentaurus TCAD work below.

Core circuits

Mixed-signal circuits for memory & compute-in-memory

A memory array is only as good as its periphery: sense amplifiers, comparators, data converters, write drivers. Analog compute-in-memory sharpens the same problem, since the periphery sets the accuracy and energy budget. This is where the analog training pays off, and these blocks are the natural next targets for CircuitEvolve’s SPICE-verified optimization loop.

From CircuitEvolve

Physics-aware AI for circuit & device design

LLM-guided evolutionary search, surrogate models, and optimization loops that treat the simulator as ground truth. CircuitEvolve is the working artifact; the direction is design automation that knows about process variation, mismatch, and yield rather than just nominal specs. It is the niche where an EE background is an advantage rather than a tax.

Systems view

Memory systems & AI workloads

Why the layers below matter: accelerator workloads are increasingly bound by memory bandwidth and capacity, not FLOPs. The SAiDL study measured that at the workload level (attention’s VRAM and throughput at long context); the RISC-V processor’s roadmap of caches, memory hierarchy, and performance counters is my on-ramp to studying data movement at the architecture level.

These are directions, not achievements. The projects below are first steps; the reading log tracks the gap honestly.

04 · Featured projects

Featured Projects

The work I would want a research mentor to look at first. Each card states what was actually done, what was learned, and what is still missing.

Ongoing Analog IC design · EDA · MLResearch internship · May – July 2026

CircuitEvolve: A Machine-Learning Optimizer for Analog Circuits

The LLM-guided evolutionary optimizer at the core of CircuitEvolve’s analog design-automation flow, built during a research internship there: parallel subagents propose, mutate, and critique candidate designs, a fitness pipeline scores them against extracted specifications, and every survivor is verified in SPICE across PVT corners and Monte Carlo mismatch before entering the next generation. It builds on earlier groundwork assembling simulation-ready analog datasets, testbenches, and specification extraction.

Why it matters

Automated analog sizing needs two things at once: a search that optimizes against real specifications, and data that carries enough context to define them. Raw netlists are not enough; a useful analog representation has to encode testbenches, bias conditions, specifications, and the design intent behind them, or the circuit’s meaning is lost. And the blocks where variability-aware automated sizing would matter most (sense amplifiers, comparators, data converters, the periphery of memory and compute-in-memory systems) are exactly the ones that live or die by mismatch and yield, which is what the yield-informed search is for.

What I did

  • Built the optimizer end-to-end: a language model proposes device sizings and netlist edits inside an evolutionary search, and each candidate is evaluated automatically in simulation instead of by hand.
  • Lead a research section on yield-informed circuit reasoning: extending the search beyond nominal specifications so its decisions account for process variation, device mismatch, and expected yield, keeping generated circuits robust across PVT corners.
  • Part of the team preparing a conference-paper submission on the framework; own the experimental evaluation and benchmarking against baseline sizing and topology-search methods.
  • Wrote the evaluation and scoring pipeline: each candidate is simulated across process-voltage-temperature corners and Monte Carlo mismatch in ngspice, then reduced to a single score over gain, phase margin, bandwidth, power, and yield, with that breakdown fed back into the next generation.
  • Took a fully-differential folded-cascode op-amp as the first circuit under optimization.
  • Reproduced MOSFET-level op-amp and OTA circuits from papers, including gain-boosted and folded-cascode topologies.
  • Built and modified LTspice/ngspice testbenches for AC, DC, transient, and operating-point analysis.
  • Extracted and organized specifications: gain, phase margin, unity-gain frequency, supply current, power, load and compensation capacitance, bias conditions.
  • Worked out how a dataset should represent netlists, testbenches, specs, PVT/corner information, sweeps, results, and golden reference parameters.

What I learned

Analog data needs far more context than digital: the same netlist means different things under different bias, load, and measurement definitions. Reproducing published circuits is interpretation, not copying. And turning a specification into a single score is itself a design choice: how that curve is shaped decides what the search actually pursues.

Limitations

Papers often omit bias voltages, device models, or simulation conditions; without the original PDK, generic models reproduce qualitative behavior rather than exact published numbers. On the optimizer side, Monte Carlo sampling noise limits how finely two close candidates can be separated, and process spread is approximated by discrete corners rather than continuous variation.

  • LTspice
  • ngspice
  • Cadence Virtuoso
  • xschem
  • Python
  • Git
  • Linux / WSL
  • LaTeX

candidate sizing → SPICE response → measured specs · gain line redraws on hover

GAINPMUGFIDD

Ongoing Computer architecture · RTLBITSilicon · project lead, 12-member team

RV32I Five-Stage Pipelined RISC-V Processor

A fully synthesizable 32-bit RISC-V processor in Verilog: a five-stage in-order pipeline with full forwarding, stall insertion, static branch prediction, and Harvard memory interfaces, synthesized for a Xilinx Artix-7 FPGA. I lead the 12-member team, owning microarchitecture decisions, task allocation, and integration across the RTL, verification, and synthesis workstreams. The design is being submitted to a tape-out competition with a path to silicon fabrication.

Why it matters

A pipeline is where data movement first becomes visible: hazards, stalls, and memory interfaces are the small-scale version of the bandwidth walls that dominate AI accelerators. The roadmap points the same core at that question directly, with caches, a memory hierarchy, and performance counters for studying data-movement bottlenecks in compute-intensive workloads.

What I did

  • Leading microarchitecture decisions, task allocation, and integration across RTL, verification, and synthesis for a 12-member team.
  • Five-stage in-order pipeline with full forwarding, stall insertion, and static branch prediction over the RV32I base ISA.
  • Harvard instruction/data memory interfaces; the design synthesizes for a Xilinx Artix-7 FPGA.
  • Preparing the submission for a tape-out competition with a path to silicon fabrication.

Limitations

In progress: RTL and verification are under active development, and the cache / memory-hierarchy / performance-counter roadmap is stated intent, not implemented yet. Nothing here is silicon-proven until the tape-out actually happens.

RV32I · 5-STAGE · FWD · BP

  • Verilog
  • Xilinx Artix-7
  • Git

five pipeline stages · one instruction issued per cycle · forwarding path draws on hover

Completed Device physics · TCADCourse project · Sem-4, 2025–26

MOSFET Design and Short-Channel Scaling Study in Sentaurus TCAD

Designed an nMOS capacitor and a 1 µm gate-length MOSFET (ND = 1020 cm−3 source/drain) in Sentaurus TCAD, then studied what changes, and what breaks, as the gate scales from 1 µm down to 20 nm.

What I did

  • Extracted Vth (constant-current method), subthreshold swing, and Ion/Ioff from IDS–VGS sweeps; characterized IDS–VDS output behavior.
  • Quantified drain-induced barrier lowering across drain biases from 50 mV to 2 V.
  • Swept gate length (1 µm → 20 nm), oxide thickness (1–5 nm), substrate doping (1015–1018 cm−3), substrate thickness, and temperature (250–400 K).
  • Analyzed energy band diagrams, potential profiles, and carrier concentrations along the oxide–semiconductor interface.

What I learned

TCAD ties geometry and doping directly to electrical behavior: short-channel effects and DIBL stop being formulas and become visible in the band diagram.

VTH · SS · ION/IOFF · DIBL

  • Sentaurus TCAD
  • Device simulation

nMOS cross-section · channel forms and the I–V curve redraws on hover; Lg marker shrinks

Completed Machine learningSAiDL induction assignment · Summer 2026

Efficient Transformers and Sparse Fine-Tuning: SAiDL Summer Assignment

Two-track study for the SAiDL (Society for AI and Deep Learning, BITS Goa) summer induction: a modular decoder-only Transformer language model with swappable attention, positional encodings, and block structure on WikiText-2, plus parameter-efficient fine-tuning of DeBERTa-v3 on GLUE CoLA, all trained on a single RTX 4060.

What I did

  • Implemented six attention variants (standard, sliding-window, sparse-block, linear, GQA, MQA) and benchmarked validation perplexity, peak VRAM, stability, and throughput across context lengths 256–4096.
  • Ran a train-short-test-long study across five positional encodings (learned absolute, sinusoidal, RoPE, ALiBi, relative bias) and built causal-conv + attention hybrid blocks.
  • Compared LoRA, AdaLoRA, and SoRA on CoLA (Matthews correlation), and derived and implemented proximal soft-thresholding vs. SGD ℓ₁-subgradient gate updates, verified for parity in NumPy and PyTorch.
  • Extended the SoRA adaptation principle to xLSTM and Mamba backbones; wrote the full LaTeX report.

What came out

At context 4096, linear attention cut peak VRAM from 10.6 GB to 4.7 GB and lifted throughput from ~0.8k to ~17.5k tokens/s (sparse-block reached ~33.9k). RoPE had the best perplexity at 512; ALiBi extrapolated flattest. On CoLA, LoRA, AdaLoRA, and proximal SoRA finished within seed-to-seed variance of each other.

Limitations

Diagnostic rather than fully converged: short hardware sweeps, single-seed CoLA runs, and from-scratch xLSTM/Mamba training that is not comparable to pretrained fine-tuning.

What this shows

At long context, attention is a memory problem before it is a compute problem: what the benchmarks actually measured was VRAM and bandwidth pressure, not FLOPs. This is the workload-level end of the data-movement question the rest of this page approaches from devices and circuits.

PPL · VRAM · MCC · PROX-ℓ1

  • PyTorch
  • Hugging Face
  • PEFT
  • NumPy
  • pandas
  • Weights & Biases
  • CUDA
  • LaTeX

attention mask · measured VRAM (GB) at ctx 4096 · loss redraws on hover

Completed Digital design · VerificationBITSilicon · Sem-4, 2025–26

Synchronous FIFO with Self-Checking Verification Environment

Parameterized synchronous FIFO in Verilog (configurable DATA_WIDTH and DEPTH), verified against an independent golden reference model with a per-cycle scoreboard, directed corner-case tests, and manual coverage counters.

What I did

  • Designed pointer and occupancy logic with wr_full / rd_empty flags, including simultaneous read–write handling.
  • Compared rd_data, count, and flags against the golden model every cycle, with detailed failure reporting.
  • Wrote directed tests for reset, fill, drain, overflow, underflow, simultaneous read/write, and pointer wrap-around.

What this shows

Verification discipline: the testbench is designed, not improvised. Bugs are caught by the scoreboard, not by staring at waveforms.

DUT · GOLDEN · SCOREBOARD · COVERAGE

  • Verilog
  • Git
  • Linux / WSL

Completed Digital design · HW/SW co-designBITSilicon · Sem-4, 2025–26

Digital Stopwatch Controller: Verilog + Verilator

Synchronous stopwatch controller in Verilog, with modular seconds (0–59) and minutes (0–99) counters under an IDLE / RUNNING / PAUSED control FSM, driven and validated from a cycle-accurate C++ program via Verilator.

What I did

  • Enforced a fully synchronous style: non-blocking assignments, synchronous up-counters, active-low reset.
  • Supported start, stop, resume, and reset through the control FSM.
  • Verified with a self-checking Verilog testbench, then drove the DUT from a Verilator C++ harness validating reset, pause/resume, and periodic MM:SS readout.

What this shows

Clean synchronous RTL style, and that hardware can be driven and inspected from software in a cycle-accurate way.

IDLE → RUNNING → PAUSED · MM:SS

  • Verilog
  • Verilator
  • C++
  • Git
  • Linux / WSL

05 · Curriculum-based projects

Curriculum-Based Projects

Course projects from Sem-4 (2025–26). Smaller in scope than the featured work, but they are where the foundations (device behavior, signal processing, quantum mechanics, low-level systems) were actually built.

Completed Electronic devices · Sem-4

Rectifying Circuits: Schottky vs. PN Junction Diodes

Compared Schottky and PN-junction diodes in half- and full-wave rectifiers using LTspice and bench measurements: forward drop, reverse recovery, and rectification efficiency. Schottky diodes won on low-voltage, high-frequency switching; PN diodes held up better under reverse bias.

VF · TRR · EFFICIENCY

  • LTspice
  • Bench measurement

Completed DSP · Sem-4

Adaptive Noise Cancellation in MATLAB

LMS-based adaptive noise cancellation built with the DSP System Toolbox and tested on synthetic and audio signals. Studied how step size and filter order trade off convergence speed, stability, and steady-state MSE, and validated the SNR improvement.

LMS · MSE · SNR

  • MATLAB
  • DSP System Toolbox

Completed Condensed matter · Sem-4

Landau Levels and the Integer Quantum Hall Effect

Derived Landau quantization for a 2D electron gas in a perpendicular magnetic field and connected level degeneracy to integer quantum Hall conductance plateaus, high-mobility 2DEG systems, and metrological resistance standards.

ħωC · FILLING FACTOR ν · σXY

  • Analytical derivation
  • Quantum mechanics

Completed Microprocessors · Sem-4

8086-Based Whack-a-Mole Game

An interactive Whack-a-Mole game on the 8086 in MASM assembly: 8255 PPI for LED and push-button I/O, the 8253 programmable timer for randomized events, and interrupt-driven timing and scoring, simulated in Proteus.

8255 · 8253 · INT · LED/KEY

  • 8086 assembly
  • MASM
  • Proteus

06 · Tools & skills

The Working Set

The languages and tools I have picked up over the last two years, from SPICE decks and TCAD to PyTorch training runs. Logos shown where an open one exists.

  • Python
  • C
  • C++
  • MATLAB
  • Verilog
  • x86 Assembly (MASM)
  • Sentaurus TCAD
  • Cadence Virtuoso
  • LTspice
  • ngspice
  • xschem
  • OrCAD PSpice
  • Simulink
  • Proteus
  • Verilator
  • Icarus Verilog
  • GTKWave
  • PyTorch
  • Hugging Face
  • NumPy
  • pandas
  • scikit-learn
  • SciPy
  • Matplotlib
  • Weights & Biases
  • CUDA
  • Jupyter
  • Git
  • GitHub
  • Linux / WSL
  • LaTeX

Languages

  • Python
  • C
  • C++
  • MATLAB
  • Verilog
  • x86 Assembly (MASM)

EDA & simulation

  • Sentaurus TCAD
  • Cadence Virtuoso
  • LTspice
  • ngspice
  • xschem
  • OrCAD PSpice
  • Simulink
  • Proteus
  • Verilator
  • Icarus Verilog
  • GTKWave

ML & scientific computing

  • PyTorch
  • Hugging Face
  • NumPy
  • pandas
  • scikit-learn
  • SciPy
  • Matplotlib
  • Weights & Biases
  • CUDA
  • Jupyter

Workflow

  • Git
  • GitHub
  • Linux / WSL
  • LaTeX

07 · Reading log

Reading Log

Books and papers, by area. Statuses are honest: “implemented” means implemented; everything else is in progress, revisiting, or queued.

Quantum mechanics

  • Sakurai, Modern Quantum Mechanics in progress
  • Griffiths, Introduction to QM revisiting
  • Shankar, Principles of Quantum Mechanics queued

Condensed matter & statistical physics

  • Ashcroft & Mermin, Solid State Physics queued
  • Girvin & Yang, Modern Condensed Matter Physics queued
  • Kardar, Statistical Physics of Particles queued

Machine learning

  • RoPE · ALiBi · relative-bias papers implemented
  • LoRA · AdaLoRA · SoRA papers implemented
  • Mamba · xLSTM papers implemented
  • Cover & Thomas, Information Theory queued
  • Boyd & Vandenberghe, Convex Optimization queued

Device physics

  • Neamen, Semiconductor Physics and Devices revisiting
  • Pierret, Semiconductor Device Fundamentals revisiting

Emerging memory & AI hardware

The new center of gravity: device physics of RRAM, PCM, FeFET, and MRAM, compute-in-memory, and the memory-systems view of AI accelerators. Queue being assembled from review articles before committing to one device class.

queue being assembled

Analog IC design

  • Sedra & Smith, Microelectronic Circuits revisiting
  • Razavi, Design of Analog CMOS Integrated Circuits revisiting

Circuit design automation

Analog synthesis, optimization, LLM-based EDA.

queue being assembled

08 · Grad school journey

Where This Is Going

I am using my undergraduate years to build the layers that memory-centric AI hardware needs: device physics, analog and digital circuits, computer architecture, and the condensed matter physics beneath them. The other habit under construction is documenting technical work properly.

  1. 2024 – 2025

    Foundations

    First-year groundwork: EEE core, programming, digital systems.

  2. 2025 – 2026

    Devices, circuits, signals, quantum

    Device physics and the TCAD simulation lab, analog circuits, signal processing, and control. Quantum Mechanics I and Physics of Semiconductor Devices anchored the physics minor. RTL design and verification work (the FIFO and stopwatch projects) with BITSilicon; the curriculum projects above come from this period.

  3. 2026 →

    Research work, and a direction

    The SAiDL summer assignment added a working deep-learning toolset (Transformers, attention variants, parameter-efficient fine-tuning) and a first workload-level look at memory bottlenecks. The main lines of work now: a research internship at CircuitEvolve, building the LLM-guided evolutionary optimizer and leading its yield-informed extension, and leading the BITSilicon RV32I processor team toward a tape-out competition. The direction ahead has narrowed to memory-centric AI hardware; summer 2027 internship preparation and a reading queue in emerging memory devices are the current groundwork.

I am early in this. The plan for the remaining undergraduate years is to build depth rather than breadth for its own sake, document the work honestly, limitations included, and produce at least one piece of work that holds up to research scrutiny.

Relevant coursework

Selected from the full transcript, grouped by how it feeds the research direction.

Devices & microelectronics

  • Electronic Devices
  • Electronic Devices Simulation (TCAD lab)
  • Physics of Semiconductor Devices
  • Microelectronic Circuits
  • Electrical Sciences
  • Electrical Machines

Circuits, signals & control

  • Digital Design
  • Microprocessors & Interfacing
  • Signals & Systems
  • Control Systems

Physics & mathematics

  • Quantum Mechanics I
  • Electromagnetic Theory I
  • Mechanical Oscillations & Waves
  • Thermodynamics
  • Mathematics I–III
  • Probability & Statistics

Foundations & practice

  • Computer Programming
  • Technical Report Writing
  • Practice School I (summer 2026)

09 · CV

Curriculum Vitae

A one-page CV is available as a PDF. The project cards above carry more detail than the CV format allows; if you are evaluating my work, they are the better starting point.

10 · Contact

Get in Touch

I am open to research conversations, technical collaboration, and feedback on my work. Email is the fastest way to reach me.