Custom GPTs for Science Data Tests

TL;DR

I built some custom GPTs using OpenAI's new CustomGPT functionality. It was a great little learning experience about how we can further prompt and 'train' the OpenAI model with some specific data. What started as a project for just my colleagues in the Science and Technologies staffroom became something that was shared widely across Queensland.

Benjamin Hyde

Education Leader & AI Builder

When OpenAI first dropped Custom GPTs my first thought was, "This could really save some time with planning if we can just work out how to make it consistent." My initial GPT was created to help out in Chemistry, specifically building more and more data tests. The students at the school I teach at sometimes have what feels like an insatiable appetite for practice tests and revision work. Honestly it's a nightmare to keep up with. Working closely with two colleagues, we built out the first version of the prototype, but found really quickly that it just didn't work consistently. Sometimes we'd get stuff that looked like a data test, sometimes we'd get stuff that really didn't.

Some quick context

These GPTs were not endorsed by QCAA, and I never presented them as official QCAA resources. They were teacher-made support tools I built using public materials plus guidance from Science colleagues. The goal was simple: give students more practice opportunities.

That distinction matters. The tools were never meant to replace endorsed assessment design, represent QCAA, or act as official material. I did hear some schools used generated data tests in endorsement workflows, which was wild to hear, but that was never the original intent.

How it started

Eventually I worked out it wasn't about the prompt but the extra material that we could provide the model with. We started to collate 'correct' examples along with the ones that we had endorsed over a couple of years. We took data tests that resembled correct ones and modified them so that they were really close to something that would be worth submitting to endorsement and from there we started to get closer and closer to the consistent result we were looking for.

Once we got it close, I let it out to the wild. My colleagues loved it, ready for each lesson and another for homework they were able to produce data tests really quickly. After that success, my colleagues posted it onto the LMS so that students could use the link to generate their own meaning that we were no longer needed in that process and it allowed our students to really quickly do as much practice work as they wanted. This was awesome as it really was the first step at our school of demonstrating to our students what effective use of Generative AI was.

It wasn't long before I was asked to make them for the other science domains we have at school. The initial versions of Physics, Biology and Agricultural Science were much quicker to make. Mostly because I knew the backend prompt was right and it was just the extra source material that we uploaded needed to be contextualised to each subject. So I grabbed some endorsed data tests that we had to use as material for the custom GPT and found that within minutes we were getting something close and it was a much smaller fine tuning loop that got it right.

With that success in the bag, I attended the next QLDSNET meeting, we had a round the room share session where each of us shared what we had done recently with AI. Honestly, I didn't think the idea we had was very special until after I shared and was asked by multiple colleagues from different schools how they can get a hold of it for their Science staff. And from there it took off. A friend of mine that I hadn't spoken to for years was teaching at an EQ school (outside my network) forwarded me a screenshot of an email where my name and a link to the work was shared with the comment "When the f&*$ did you start teaching Biology?" We had a funny back and forth talking about how I don't and I think that really changed my thinking about the impact that I could have regarding the implementation of AI not only in my subject area but across faculties.

I had a few requests asking for other Science domains to be made as well, so I rounded out what was my collection with Psychology and Marine Science. I remember being asked why I hadn't included those to begin with and honestly it was just because I only thought this would be a useful tool for my colleagues at my school so only considered making the data test generators for the Sciences that we offer.

What I learned

I think the biggest thing that I took away from this was some confidence about how to get started shaping AI to do things we really needed it to do. One of the biggest takeaways was about how it didn't matter how much we refined the prompt for the LLM, it was more about the extra information and examples we provided as attachments that made it more consistent. For lack of a better term, that extra training data (it's not really training data I know) that we provided to the LLM really made it work.

At that point I started to play around with a framework for my own prompting to replicate what we had in data tests in normal prompts. Remember this was the early days of Chat, so the models weren't as flexible as they are now in regards to shitty prompts. RIPO was born. I started teaching it to my students and found that they got better results too. As it was the early days the Role context was important for the model.

R - Role - what 'job' did I want the model to impersonate being.

I - Inputs - what is a summary of the inputs I'm going to provide.

P - Process - what do I want the model to do.

O - Outputs - what is the expected output or where possible provide an example of the expected output.

For anyone teaching computing the IPO part should look really familiar. I leant into what I already knew and got comfortable with it. If an LLM is a computer program/model, why not treat it like any other piece of code. Input, Process, Output — simple.

Closing thought

This was the beginning really of my AI journey, it was a really good process and I learned so much throughout and it was really nice to have created something that made an impact far further than I ever expected.

The uptake surprised me most. I built this for local use, but it spread much further across Queensland. On 4 August 2024, the Biology Data Test GPT alone had already passed 250 uses.

Update (March 2026): the full GPT set has now gone past 10,000 conversations in total.

Build Notes

Approach

Start with one concrete teacher pain point and build a narrow Custom GPT around it.

Tools Used

OpenAI Custom GPTs, prompt engineering, ongoing staff feedback loops

What Worked

Less repeated workload for revision prep, more practice opportunities for students, and a lower barrier for staff who didn't want to write complex prompts.

What Failed

Accuracy was always the weak spot, subject nuance depended heavily on expert feedback, and ongoing syllabus alignment takes regular maintenance. Free-tier daily usage limits also frustrated some users.

What's Next

Keep refining against newer syllabus expectations, improve subject-specific nuance handling, and only expand where there is a clear teacher need.

Want to talk about this? Get in touch →

More from WISTW

WISTW· What I Shipped This Week

Game Theory Tournament: A Coding Competition for My CS Students

Watched a Veritasium video on Game Theory, got inspired, and built a full tournament platform so my Year 7–9 Computer Science students could compete in their own Prisoner's Dilemma-style coding competition.

May 12, 2026·5 min read

Game TheoryPythonComputer Science

WISTW· What I Shipped This Week

OnPace: A Timeline Students Can Actually Use

Built OnPace after hearing students repeatedly ask "Where should we be up to by today?" — a scaffolded timeline view to make assignment progress obvious at a glance.

March 19, 2026·3 min read

OnPaceAssessmentClassroom Tools