What is an efficient way to find the job for an OID?
Each job COB is for one Git object ID (typically commit). Once we start producing a lot of job COBs, e.g., by running CI for every change over the years, we will have many job COBs. To add a new run, we first need to map a Git OID to a job ID. At the moment I’m using code like this:
fn job_for_commit<'a>(jobs: &Jobs<'a, Repository>, wanted: Oid) -> Result<JobId, JobError> {
eprintln!("job_for_commit: wanted={wanted}");
for item in jobs.all().map_err(JobError::AllJobs)? {
let (job_id, job) = item.map_err(JobError::AllJobsJob)?;
let job_id = JobId::from(job_id);
eprintln!("job_for_commit: consider {job_id} with oid {}", job.oid());
if job.oid() == &wanted {
eprintln!("job_for_commit: wanted={wanted} => {job_id}");
return Ok(job_id);
}
}
Err(JobError::NoJob(wanted))
}
This is a linear search. With relatively small numbers of COBs, it’s find. I’ve not benchmarked this, but I’m assuming up to tens of thousands of COBs is fine. At some point, linear search will be noticeably slow. It might not matter, given it happens on a CI node, but in the interest of at least trying to avoid predictable bottlenecks, is there a more efficient way of mapping a Git OID to job IDs that refer to it?
Maybe I’m missing something, but can you use
Jobs::get(JobId::from(wanted))? This should be a similar process to getting aPatchDoes that actually look up a job ID given a commit id?
https://docs.rs/radicle-job/latest/src/radicle_job/lib.rs.html#114-118 seems to only conver the
ObjectIdintoJobIdwithotu any lookup.Ah! You want to find a
Jobfor a given commit? So, a reverse lookup of whatJobs are running for a given commit?That indeed does not exist. I’m afraid that a Git repository is not going to be a good use for this either. I’m afraid that do this efficiently you’d need a layer that caches that relationship.
Right, that’s what I was afraid of. Let’s talk about what the relationship cache should looik like, once this is starting to become a problem. For now, linear lookup is good enough.
Actually, an in-memory cache is probably good enough for my purposes. A
HashMapfrom commit object ID to job ID. If it’s not persisted, the next process will need to start with an empty cache, but the access patterns for the CI broker are probably fine with a very occasional slow lookup.