@Ahmad Jumah
Ahmad,
This hits close to home - during my time at Deloitte’s Office of General Counsel and Chief Accounting Officer, I was involved in drafting and reviewing disclosures for some of the largest public company data breaches reported in 10-Ks and to state attorneys general. I deeply appreciate the direction and interdisciplinary nature of your work.
Before offering specific suggestions, I’d like to make sure I fully understand the scope and goals of the study. I’ve rewritten the abstract in plain language - does this align with your vision?
Plain-language version of the abstract:
“We’re studying how publicly traded companies describe data breaches in the MD&A section of their 10-K filings. The goal is to determine whether the language they use contains any signals about the breach’s severity or potential financial impact.
To do this, we use large language models (LLMs) - like ChatGPT, Claude, or custom-trained models - to analyze how these companies frame breach-related disclosures. We’re looking for linguistic cues that might help investors, analysts, or regulators assess risk or materiality.
We also want to explore whether these cues can help predict downstream effects on the company - such as changes in financial performance, stock price, or credit ratings.
Ultimately, the research aims to improve how companies report cyber incidents, make it easier for stakeholders to detect meaningful risks, and advance transparency and accountability in financial reporting.”
On your question about “alternative approaches” - just to clarify, are you looking for a different methodological lens for studying breach disclosures specifically, or are you more broadly asking how LLMs might be applied across other areas of accounting and finance?
A few methodological thoughts and clarifying questions that may refine the study design:
- Are breach disclosures typically found in MD&A sections with enough richness for LLM analysis?
In my experience, they often appear in Risk Factors (Item 1A), Legal Proceedings, or MD&A - sometimes across all three. For reference, I’m attaching links below to 10-K excerpts from Equifax (2018) and SolarWinds (2020). Disclosures may be boilerplate or vague - how do you plan to handle that? (Ctrl+F function search for "breach")
SolarWinds 2020 10K: swi-20201231
Equifax 2018 10K: Document
- What performance metrics are you predicting?
Is the impact defined in terms of stock price movement, operational disruption, market sentiment, credit ratings, or a combination?
- Who is the audience for the “financial impact”?
Are you targeting insights useful to investors, regulators, corporate boards - or a cross-section?
- How do you plan to extract materiality cues using LLMs?
Will you feed MD&A text directly into LLMs and prompt for features, or are you using fine-tuned models, embeddings, or classification layers on top of LLM output?
- What’s the strategy for vague or repetitive disclosures?
If most breach narratives are templated or sanitized, could that dilute the predictive power of the model? Will there be a signal-to-noise filtering process?
- What exactly is the predictive innovation?
Are you forecasting the likelihood of a specific financial outcome following a breach disclosure? Or identifying latent risk profiles from the language? Clarifying this could sharpen your contribution to both academic and industry audiences.
Really excited about this work - happy to contribute deeper, whether as a reviewer, brainstorming partner, or even collaborator. You've touched on a genuinely important area at the intersection of finance, disclosure, risk, and AI. Let’s keep the dialogue going.
Respectfully,
Ilya Ilienko, (dual) MBA, CMA, CPA
------------------------------
Ilya Ilienko, dual MBA, CPA, CMA
Board Member / Director
East Coast - United States
------------------------------