SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

About

SciEval is the first benchmark dataset for Automatic Instructional Materials Evaluation (AIME) — a generative AI task where large language models evaluate K–12 science instructional materials by producing quality scores and evidence-based justifications aligned with the EQuIP rubric.

The dataset consists of NGSS-aligned science lessons sourced from OpenSciEd, annotated by trained science education researchers through a multi-round process with structured adjudication. Each lesson is evaluated across 13 criteria spanning three-dimensional learning design, instructional supports, and student progress monitoring.

Dataset Details

Annotation Schema

Column	Description
`ID`	Unique identifier (e.g., `Course_lesson_N_Criterion`)
`File`	Source PDF filename
`Criterion`	EQuIP rubric criterion code
`Score`	Rating: 0 (N/A), 1 (Inadequate), 2 (Adequate), 3 (Extensive)
`Evidence`	Detailed justification with page references
`Pos_Evidence`	Positive evidence examples
`Neg_Evidence`	Negative evidence / gaps
`Advice`	Reviewer recommendations

Corpus Statistics

Property	Value
Instructional units	32
Total pages	4,499
Avg. pages per lesson	16.5
Avg. words per lesson	~5,908
Score 0 (N/A)	41.8%
Score 3 (Extensive)	54.2% of active

Train / Test Split

218

Training PDFs

Test PDFs

Dataset Preview

Sample annotations (8 of 3,549 rows) Split: train

ID string	File string	Criterion string	Score int	Evidence string
Bodies_Work_lesson_1_I_A	Bodies_Work_lesson_1.pdf	I_A	2	Examples of evidence that learning is driven by making sense of phenomena i...
Bodies_Work_lesson_1_I_B_1	Bodies_Work_lesson_1.pdf	I_B_1	3	Asking Questions and Defining Problems: Ask questions that arise from caref...
Cancer_lesson_1_I_B_3	Cancer_lesson_1.pdf	I_B_3	2	Systems and System Models: Models (e.g., physical, mathematical, computer mo...
Cancer_lesson_1_II_A	Cancer_lesson_1.pdf	II_A	3	Students experience the phenomenon, problems, and investigative phenomenon as...
Ocean_Plastic_lesson_1_II_C	Ocean_Plastic_lesson_1.pdf	II_C	3	All science information is accurate and grade level appropriate based on the ...
Hail_rain_lesson_1_II_B	Hail_rain_lesson_1.pdf	II_B	2	Throughout the unit, students are provided with a large number of opportuniti...
Hail_rain_lesson_1_III_D	Hail_rain_lesson_1.pdf	III_D	3	Students are provided with the appropriate background needed for completing...
Ocean_Plastic_lesson_1_III_E	Ocean_Plastic_lesson_1.pdf	III_E	3	In Lesson 1 students are involved in learning about each of the CCC and ask q...

EQuIP Rubric Criteria

Each lesson is evaluated across 13 criteria organized into three categories:

I: NGSS 3D Design

I.A Explaining Phenomena / Designing Solutions
I.B.1 Science & Engineering Practices (SEPs)
I.B.2 Disciplinary Core Ideas (DCIs)
I.B.3 Crosscutting Concepts (CCCs)
I.C Integrating the Three Dimensions

II: Instructional Supports

II.A Relevance and Authenticity
II.B Student Ideas
II.C Scientific Accuracy

III: Monitoring Student Progress

III.A Monitoring 3D Performance
III.B Formative Assessment
III.C Scoring Guidance
III.D Unbiased Tasks / Items
III.E Opportunity to Learn

Requesting Access

The SciEval dataset is available by request only. To obtain access for research purposes, please send an email with a brief description of your intended use to:

✉️

Request Dataset

freemanchen115@gmail.com

The dataset comprises 221 lesson PDFs and 222 annotation files (13 rows per file, one per EQuIP criterion), split into 218 training and 55 test documents via stratified sampling. Please include your name, affiliation, and a short summary of your research project in your request.

Citation

If you use SciEval in your research, please cite our paper:

@inproceedings{li2026scieval,
  title     = {SciEval: A Benchmark for Automatic Evaluation of
               K-12 Science Instructional Materials},
  author    = {Li, Zhaohui and He, Peng and Chen, Zhiyuan and
               Liu, Honglu and Wang, Zeyuan and Li, Tingting
               and Xiong, Jinjun},
  booktitle = {Proceedings of the 27th International Conference on
               Artificial Intelligence in Education (AIED)},
  year      = {2026}
}