Small Business Shared Interest Group

  • 1.  Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-11-2025 01:17 PM
    One of my presentations at ECIS in Amman focused on
    UNVEILING DATA BREACH DISCLOSURES: AN INTEGRATED ANALYSIS USING LARGE LANGUAGE MODELS
    Any suggestions?
    Ahmad H. Juma'h, Ph.D., CPA, CMA
    Professor of Accounting
    Elham Khorasani Buxton, Ph.D.
    Associate Professor of Computer Science
    Shashank Mani Tripathi, MS
    Computer Science
    University of Illinois Springfield
    Springfield, Illinois, USA
    Abstract
    This study introduces an advanced impact materiality disclosure framework for breached companies by integrating large language models (LLMs) and ensemble learning techniques. Using Management Discussion & Analysis (MD&A) sections from 10-K filings, we develop predictive classifiers based solely on publicly available financial data from firms reporting data breach incidents in their financial statements. Unlike traditional regression-based models, our approach enhances predictive accuracy by leveraging natural language processing (NLP) to extract materiality reporting cues related to data breach features. Further we use predictive analysis and machine learning (ML) to predict impact on performance. This hybrid methodology combines structured financial data with unstructured textual insights, offering a scalable, cost-effective solution for identifying financial impact materiality reporting. Our findings contribute to academia and industry by advancing predictive analytics for materiality disclosures, improving risk assessment, and strengthening financial transparency in publicly traded companies.


    ------------------------------
    Ahmad H. Juma'h, Ph.D., CMA, CPA
    Professor of Accounting
    University of Illinois Springfield
    Illinois, USA
    ------------------------------


  • 2.  RE: Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-13-2025 03:32 PM

    Thanks for posting the results of your ongoing work @Ahmad Jumah

    I for one, would love to see how you bring Large Language Models together with Machine Learning and Natural Language Processing. I would love to peruse the test, hypothesis, and setup. Do we have a draft rendition?

    Ahmad, what are you looking for suggestion on?



    ------------------------------
    Ilya Ilienko, dual MBA, CPA, CMA
    Board Member / Director
    East Coast - United States
    ------------------------------



  • 3.  RE: Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-13-2025 03:48 PM

    Please read the abstract and comment on it. Do you have an alternative approach to study the use of LLM in accounting and finance?



    ------------------------------
    Ahmad H. Juma'h, Ph.D., CMA, CPA
    Professor of Accounting
    University of Illinois Springfield
    Illinois, USA
    ------------------------------



  • 4.  RE: Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-19-2025 09:12 AM

    @Ahmad Jumah

    Ahmad,

    This hits close to home - during my time at Deloitte’s Office of General Counsel and Chief Accounting Officer, I was involved in drafting and reviewing disclosures for some of the largest public company data breaches reported in 10-Ks and to state attorneys general. I deeply appreciate the direction and interdisciplinary nature of your work.

    Before offering specific suggestions, I’d like to make sure I fully understand the scope and goals of the study. I’ve rewritten the abstract in plain language - does this align with your vision?

    Plain-language version of the abstract:

    “We’re studying how publicly traded companies describe data breaches in the MD&A section of their 10-K filings. The goal is to determine whether the language they use contains any signals about the breach’s severity or potential financial impact.

    To do this, we use large language models (LLMs) - like ChatGPT, Claude, or custom-trained models - to analyze how these companies frame breach-related disclosures. We’re looking for linguistic cues that might help investors, analysts, or regulators assess risk or materiality.

    We also want to explore whether these cues can help predict downstream effects on the company - such as changes in financial performance, stock price, or credit ratings.

    Ultimately, the research aims to improve how companies report cyber incidents, make it easier for stakeholders to detect meaningful risks, and advance transparency and accountability in financial reporting.”

    On your question about “alternative approaches” - just to clarify, are you looking for a different methodological lens for studying breach disclosures specifically, or are you more broadly asking how LLMs might be applied across other areas of accounting and finance?

    A few methodological thoughts and clarifying questions that may refine the study design:

    1. Are breach disclosures typically found in MD&A sections with enough richness for LLM analysis?
      In my experience, they often appear in Risk Factors (Item 1A), Legal Proceedings, or MD&A - sometimes across all three. For reference, I’m attaching links below to 10-K excerpts from Equifax (2018) and SolarWinds (2020). Disclosures may be boilerplate or vague - how do you plan to handle that? (Ctrl+F function search for "breach")

      SolarWinds 2020 10K: swi-20201231
      Equifax 2018 10K: Document
    2. What performance metrics are you predicting?
      Is the impact defined in terms of stock price movement, operational disruption, market sentiment, credit ratings, or a combination?
    3. Who is the audience for the “financial impact”?
      Are you targeting insights useful to investors, regulators, corporate boards - or a cross-section?
    4. How do you plan to extract materiality cues using LLMs?
      Will you feed MD&A text directly into LLMs and prompt for features, or are you using fine-tuned models, embeddings, or classification layers on top of LLM output?
    5. What’s the strategy for vague or repetitive disclosures?
      If most breach narratives are templated or sanitized, could that dilute the predictive power of the model? Will there be a signal-to-noise filtering process?
    6. What exactly is the predictive innovation?
      Are you forecasting the likelihood of a specific financial outcome following a breach disclosure? Or identifying latent risk profiles from the language? Clarifying this could sharpen your contribution to both academic and industry audiences.

    Really excited about this work - happy to contribute deeper, whether as a reviewer, brainstorming partner, or even collaborator. You've touched on a genuinely important area at the intersection of finance, disclosure, risk, and AI. Let’s keep the dialogue going.

    Respectfully,
    Ilya Ilienko, (dual) MBA, CMA, CPA



    ------------------------------
    Ilya Ilienko, dual MBA, CPA, CMA
    Board Member / Director
    East Coast - United States
    ------------------------------


  • 5.  RE: Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-19-2025 09:19 AM

    Thank you, Ilya, for sharing your experience. This research project is related to our GAI project. So, we need to discuss it upon your availability. Thank you.



    ------------------------------
    Ahmad H. Juma'h, Ph.D., CMA, CPA
    Professor of Accounting
    University of Illinois Springfield
    Illinois, USA
    ------------------------------



  • 6.  RE: Any suggestions for Using LLM to analyze Data Breach Disclosure?

    Posted 05-21-2025 11:34 AM

    Happy to jump in and continue the discussion @Ahmad Jumah

    Speak soon,

    ilya



    ------------------------------
    Ilya Ilienko, dual MBA, CPA, CMA
    Board Member / Director
    East Coast - United States
    ------------------------------