Seongland
About
Articles
Publications
Projects
Apps
About
Articles
Publications
Projects
Apps
Publications
Research papers and technical writing.
Filter:
LLM Safety
Mechanistic Interpretability
Sparse Autoencoders
Steering Vectors
Summarization
Clear
CRL: Concept Bottleneck Sparse Autoencoders for Scalable Concept Representation Learning
Preprint
Mechanistic Interpretability
Sparse Autoencoders
Steering Vectors
LLM Safety
Confidence Manifold: Revealing LLM Internal States via Topological Inconsistency Analysis
Preprint
LLM Safety
Mechanistic Interpretability
CorrSteer: Steering LLMs via Correlation-based Corrections
Preprint
Steering Vectors
Mechanistic Interpretability
LLM Safety
FaithfulSAE: Evaluating Faithfulness of Sparse Autoencoders in Large Language Models
ACL 2025 SRW
Sparse Autoencoders
Mechanistic Interpretability
LibVulnWatch: Automated Vulnerability Monitoring Using LLMs for Open-Source Libraries
ICML 2025 TAIG
LLM Safety
RTSum: Relation Triple-based Interpretable Summarization with Multi-level Salience Visualization
NAACL 2024 Demo
Summarization
No publications match the selected tags.