What is an efficient way to find the job for an OID? — radicle-job

Radicle Job Collaborative Object

What is an efficient way to find the job for an OID?

liw opened 11 months ago

Each job COB is for one Git object ID (typically commit). Once we start producing a lot of job COBs, e.g., by running CI for every change over the years, we will have many job COBs. To add a new run, we first need to map a Git OID to a job ID. At the moment I'm using code like this:

fn job_for_commit<'a>(jobs: &Jobs<'a, Repository>, wanted: Oid) -> Result<JobId, JobError> {
    eprintln!("job_for_commit: wanted={wanted}");
    for item in jobs.all().map_err(JobError::AllJobs)? {
        let (job_id, job) = item.map_err(JobError::AllJobsJob)?;
        let job_id = JobId::from(job_id);
        eprintln!("job_for_commit: consider {job_id} with oid {}", job.oid());
        if job.oid() == &wanted {
            eprintln!("job_for_commit: wanted={wanted} => {job_id}");
            return Ok(job_id);
        }
    }

    Err(JobError::NoJob(wanted))
}

This is a linear search. With relatively small numbers of COBs, it's find. I've not benchmarked this, but I'm assuming up to tens of thousands of COBs is fine. At some point, linear search will be noticeably slow. It might not matter, given it happens on a CI node, but in the interest of at least trying to avoid predictable bottlenecks, is there a more efficient way of mapping a Git OID to job IDs that refer to it?

z6MkireR...3voM commented 11 months ago

Maybe I'm missing something, but can you use Jobs::get(JobId::from(wanted))? This should be a similar process to getting a Patch

liw commented 10 months ago

Does that actually look up a job ID given a commit id?

liw commented 10 months ago

https://docs.rs/radicle-job/latest/src/radicle_job/lib.rs.html#114-118 seems to only conver the ObjectId into JobId withotu any lookup.

z6MkireR...3voM commented 10 months ago

Ah! You want to find a Job for a given commit? So, a reverse lookup of what Jobs are running for a given commit?

That indeed does not exist. I'm afraid that a Git repository is not going to be a good use for this either. I'm afraid that do this efficiently you'd need a layer that caches that relationship.

liw commented 10 months ago

Right, that's what I was afraid of. Let's talk about what the relationship cache should looik like, once this is starting to become a problem. For now, linear lookup is good enough.

Actually, an in-memory cache is probably good enough for my purposes. A HashMap from commit object ID to job ID. If it's not persisted, the next process will need to start with an empty cache, but the access patterns for the CI broker are probably fine with a very occasional slow lookup.