Continuing from Prompt Processing with AuxTel Imaging Survey Data 2023

Slack: #auxtel-prompt-processing

APDB : `rubin@usdf-prompt-processing.slac.stanford.edu/lsst-devl` in schema pp_apdb_latiss . Also see Accessing the APDB in the USDF

Link to the AuxTel calendar to see when the next run is planned.

day_obs of data collectionObserving run

Tag of prompt_processing or prompt-service

Output collection in /repo/embargo

Notestemplates collection (as chained in LATISS/templates)



LATISS/prompt/output-<day_obs>









2024-04-25


2.6.0 (w_2024_16)




2024-04-24


2.6.0 (w_2024_16)

LATISS/prompt/output-2024-04-24

10 AUXTEL_PHOTO_IMAGING nextVisit events, 8 raws

  • 2 canceled
  • 8 successful ApPipe runs
(no change)
2024-04-22


2.6.0 (w_2024_16)

LATISS/prompt/output-2024-04-22  LATISS/prompt/output-2024-04-22/ApPipe-noForced/prompt-proto-service-latiss-00083   LATISS/prompt/output-2024-04-22/SingleFrame/prompt-proto-service-latiss-00083

154 AUXTEL_PHOTO_IMAGING nextVisit events, 153 raws

  • 1 canceled
  • 8 failed calibrateImage: 5 DM-43588
    DM-43593
  • 115 successful ApPipe runs 
  • 28 successful single frame runs
  • 2 timed out waiting for image. The images arrived after the timeout.  
(no change)
2024-04-19


2.6.0 (w_2024_16)

LATISS/prompt/output-2024-04-19 LATISS/prompt/output-2024-04-19/ApPipe-noForced/prompt-proto-service-latiss-00083

69 AUXTEL_PHOTO_IMAGING nextVisit events, 64 raws

  • 3 events were canceled
  • 2 no images
  • 63 successful runs 
  • 1 image arrived too late (~200sec after the nextVisit event). Pod timed out.  
(no change)
2024-04-18


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-18

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • Failed with postgres error like the previous night.  The failure happened on one specific pod which was ~1 week old. It's also the same troublesome pod as last night. The pod was killed manually, and then some others pods were able to process normally. 
  • 6 successful runs (after the manual intervention) 
(no change)
2024-04-17


2.5.0 (w_2024_14)

N/A

99 AUXTEL_PHOTO_IMAGING nextVisit events, 1 canceled, 97 raws

All failed with Postgres error, possibly from a underlying network error.  

(no change)
2024-04-16


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-16

LATISS/prompt/output-2024-04-16/ApPipe-noForced/prompt-proto-service-latiss-00082

Starting on 2024-04-11, have 30 pods running continuously to test if a Knative scale down bug is the source of the dropped pod issue 

(autoscaling.knative.dev/min-scale: '30')

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws, 96 successful runs

(no change)
2024-04-10


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-10

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 93 successful runs
  • 3 premature pod shutdown
(no change)
2024-04-09


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-09

LATISS/prompt/output-2024-04-09/ApPipe-noForced/prompt-proto-service-latiss-00079

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90 successful runs
  • 2 failed calibrateImage: DM-43588, DM-43777 
  • 4 premature pod shutdown
(no change)
2024-04-08


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-08

LATISS/prompt/output-2024-04-08/ApPipe-noForced/prompt-proto-service-latiss-00079

56 AUXTEL_PHOTO_IMAGING nextVisit events, 56 raws

  • 51 successful runs

  • 1 failed calibrateImage DM-43588 - Getting issue details... STATUS
  • 4 premature pod shutdown

(no change)
2024-04-05


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-05


LATISS/prompt/output-2024-04-05/ApPipe-noForced/prompt-proto-service-latiss-00079

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90 successful runs
  • 6 premature pod shutdown
(no change)
2024-04-04


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-04


LATISS/prompt/output-2024-04-04/ApPipe-noForced/prompt-proto-service-latiss-00079

Dan added a patch of the readiness probe settings for DM-41829

96 AUXTEL_PHOTO_IMAGING nextVisit, 96 raws

  • 84 successful runs
  • 1 connection failed, no prompt-service
  • 5 timed out waiting for image DM-39022 - Getting issue details... STATUS
  • 6 premature pod shutdown
(no change)
2024-04-03


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-03

98 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 1 was canceled
  • 1 no raw image
  • 85 successful runs
  • 1 failed calibrateImage DM-43588
  • 6 Timed out waiting for image
  • 4 started running pipeline but got premature pod shutdown
(no change)
2024-04-02


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-02

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 83 successful runs
  • 1 connection refused
  • 2 timed out DM-39022 , DM-42825
  • 10 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-04-01


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-01

LATISS/prompt/output-2024-04-01/ApPipe-noForced/prompt-proto-service-latiss-00076

Started to use ApPipe-noForced.yaml 


76 AUXTEL_PHOTO_IMAGING nextVisit events, 76 raws
  • 72 successful runs
  • 3 hit broker communication failure.  DM-43590 - Getting issue details... STATUS
  • 1 premature pod shutdown DM-41829 - Getting issue details... STATUS


(no change)
2024-03-29


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-03-29

LATISS/prompt/output-2024-03-29/ApPipe/prompt-proto-service-latiss-00075

Isolated node test continues; LATISS storage allocation reduced to 20 GiB to support more pods.


96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 72 successful runs
  • 2 failed calibrateImage with partial outputs: DM-43593 and  MeasureApCorrError
  • 4 hit broker communication failure.  DM-43590 - Getting issue details... STATUS . Two with partial outputs, two without outputs.
  • 1 tcp connection reset by peer. No prompt-service. 
  • 6 time out/pod ready after image arrival  DM-39022 , DM-42825
  • 10 premature pod shutdown 
  • 1 hit 900 sec timeout  
(no change)
2024-03-28


2.3.0 (d_2024_03_26) 

(6a646b51)

LATISS/prompt/output-2024-03-28

LATISS/prompt/output-2024-03-28/ApPipe/prompt-proto-service-latiss-00060

Ran on isolated nodes to rule out resource contention as an issue. Alert prod is on. 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 53 successful runs
  • 36 timeouts because the isolated mode couldn't allocate enough pods
  • 4 failed calibrateImage in denormalizeMatches DM-43588 - Getting issue details... STATUS
  • 1 failed calibrateImage with MeasureApCorrError DM-43306 - Getting issue details... STATUS
  • 1 failed diaPipe with broker transport failure DM-43590 - Getting issue details... STATUS
  • 1 worker timed out
(no change)
2024-03-27


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-27

LATISS/prompt/output-2024-03-27/ApPipe/prompt-proto-service-latiss-00058

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

Around 20:45 PT, a k8s node that went offline with pods stuck in terminating, affecting APDB Postgres, the knative controller, a running prompt-processing pod, etc. 

  • 81 successful runs
  • 5 premature pod shutdown 
  • 3 failed in DIA with problems connecting to APDB
  • 7 not processed because fan out was stuck 
(no change)
2024-03-26


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-26                                                     LATISS/prompt/output-2024-03-26/ApPipe-noForced/prompt-proto-service-latiss-00057

ApPipe-noForced.yaml was used 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 87 successful runs
  • 1 failed calibrateImage with MeasureApCorrError DM-43306 - Getting issue details... STATUS
  • 8 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-25


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-25

LATISS/prompt/output-2024-03-25/ApPipe/prompt-proto-service-latiss-00056

On 2024-03-22, kafka and the knative controller are moved off an Intel NIC where they were on. 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 95 outputs
  • 1 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-21


2.2.0 (w_2024_12)

(a0d41eae6b)

LATISS/prompt/output-2024-03-21/ApPipe/prompt-proto-service-latiss-00053

96 AUXTEL_PHOTO_IMAGING nextVisit, 96 raws

  • 94 successful runs
  • 2 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)

2024-03-19



2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-19/ApPipe/prompt-proto-service-latiss-0

On 2024-03-18, prompt service is moved off the intel NIC nodes that could cause dropped pods


13 AUXTEL_PHOTO_IMAGING nextVisit events, 11 raws

  • 2 were canceled
  • 9 successful runs
  • 2 partial outputs: 1 DM-43247/DM-43277, 1 "Failed to determine psfex psf: too few good stars." DM-43777
(no change)
2024-03-14


2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-14/ApPipe/prompt-proto-service-latiss-00051

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 94 successful runs
  • 2 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-13

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-13/ApPipe/prompt-proto-service-latiss-00050

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 91  successful runs
  • 1 failed calibrateImage DM-43248 - Getting issue details... STATUS
  • 4  premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-12

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-12/ApPipe/prompt-proto-service-latiss-00049

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 66 successful runs
  • 24 failed calibrateImage with MeasureApCorrError DM-43306 - Getting issue details... STATUS
  • 1 failed calibrateImage with DM-43247 - Getting issue details... STATUS
  • 5 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-08

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-08/ApPipe/prompt-proto-service-latiss-00049

97 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 1 was canceled
  • 91 successful runs
  • 5 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-07

N/A

2.1.0 (02348c99f) (w_2024_10)

LATISS/prompt/output-2024-03-07/ApPipe/prompt-proto-service-latiss-00049

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90  successful runs, with the new "initial_pvi"
  • 3 partial outputs, calibrateImage failed. DM-43247 - Getting issue details... STATUS DM-43248 - Getting issue details... STATUS
  • 3 premature pod shutdown DM-41829 - Getting issue details... STATUS
(no change)
2024-03-06

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-06/ApPipe/prompt-proto-service-latiss-00041

65 AUXTEL_PHOTO_IMAGING nextVisit events, 64 raws

  • 1 was canceled 
  • 63 successful runs 
  • 1 premature pod shutdown DM-41829.
(no change)
2024-03-05

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-05/ApPipe/prompt-proto-service-latiss-00041

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 96 successful runs 
(no change)
2024-03-04

N/A

w_2024_08 (1e350bec)

N/A

100 raws, nextVisit did not arrive USDF on time

(no change)
2024-03-03

N/A

w_2024_08 (1e350bec)

N/A

28 raws, nextVisit did not arrive USDF on time

(no change)
2024-03-02

N/A

w_2024_08 (1e350bec)

N/A

76 raws, nextVisit did not arrive USDF on time 

(no change)
2024-03-01

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-03/ApPipe/prompt-proto-service-latiss-00040

96 raws, nextVisit did not arrive USDF on time

One exposure was processed on 2024-03-03 ~15:46PT when EFD data and pp kafka came back online. 

(no change)
2024-02-29

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-29/ApPipe/prompt-proto-service-latiss-00040

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 93 have pipeline outputs 
    • 24 successful runs
    • 69 partial outputs, "WCS fit failed" in CalibrateTask DM-43160
  • 3 premature pod shutdown DM-41829.
(no change)
2024-02-27

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-27/ApPipe/prompt-proto-service-latiss-00040

20 AUXTEL_PHOTO_IMAGING nextVisit events, 20 raws

  • 13 successful runs
  • 7 premature pod shutdown DM-41829.
(no change)
2024-02-26

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-26/ApPipe/prompt-proto-service-latiss-00040

32 AUXTEL_PHOTO_IMAGING nextVisit events, 24 raws

  • 6 groups were canceled
  • 2 groups never arrived; likely failed. 
  • 4 ApPipe outputs
  • 18 partial outputs (postISRCCD). Failed in characterizeImage
  • 2 premature pod shutdown DM-41829.  The knative scale down delay was extended to 5min on 02-23. But network reset can still trip the pods. 
(no change)

2024-02-19 

2024-02-20

N/A

w_2024_05 (54e20b0)

N/A

AUXTEL_PHOTO_IMAGING was run but no nextVisit nor data were transferred to USDF: IHS-7703

(no change)
2024-02-13

SUMMIT-8516

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-13/ApPipe/prompt-proto-service-latiss-00039    

LATISS/prompt/output-2024-02-13/SingleFrame/prompt-proto-service-latiss-00039

Started using the "latiss_prompt" postgres user to access /repo/embargo

43 AUXTEL_PHOTO_IMAGING nextVisit events, 41 raws

  • 2 were canceled
  • 35 successful runs with either ApPipe or single frame products
  • 6 premature pod shutdown DM-41829


(no change)
2024-02-12

SUMMIT-8516

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-12/ApPipe/prompt-proto-service-latiss-00038

LATISS/prompt/output-2024-02-12/SingleFrame/prompt-proto-service-latiss-00038


112 AUXTEL_PHOTO_IMAGING nextVisit events, 105 raws

  • 6 were canceled
  • 1 image never arrived
  • 1  file arrived after timeout
  • 38 ApPipe products
  • 47 single frame products
  • 2 pipeline failure with partial products
  • 17 premature shutdown DM-41829

(no change)
2024-02-08

N/A

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-08/ApPipe/prompt-proto-service-latiss-00038

64 AUXTEL_PHOTO_IMAGING nextVisit events, 63 raws

  • 1 image never arrived
  • 45 ApPipe outputs
  • 6 timed out waiting for image DM-39022, DM-42825
  • 12 premature pod shutdown DM-41829
(no change)
2024-02-07

N/A

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-07/ApPipe/prompt-proto-service-latiss-00038

100 AUXTEL_PHOTO_IMAGING nextVisit events, 99 raws

  • 1 image never arrived
  • 90 ApPipe outputs
  • 9 premature pod shutdown DM-41829
(no change)
2024-02-06

N/A

d_2024_01_31 (7fbe0199)

LATISS/prompt/output-2024-02-06/ApPipe/prompt-proto-service-latiss-00037

94 AUXTEL_PHOTO_IMAGING nextVisit events, 89 raws

  • 3 were canceled
  • 2 images never arrived
  • 83 ApPipe outputs
  • 8 premature pod shutdown DM-41829 (including 2 while waiting for a canceled visit)
(no change)
2024-02-05

N/A

d_2024_01_31 (7fbe0199)

N/A

16 AUXTEL_PHOTO_IMAGING nextVisit events, 8 raws

  • 6 were canceled
  • 2 images never arrived
  • 8 had mismatched APDB schema (DM-42798)
(no change)
2024-01-31

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28)

N/A1 event was sent and canceled. No image. (no change)

2024-01-30

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28)

LATISS/prompt/output-2024-01-30/ApPipe/prompt-proto-service-latiss-00036

k8s vCluster was upgraded to 1.26.9
Knative was upgraded to 1.11.0 to 1.12.3 

60  AUXTEL_PHOTO_IMAGING nextVisit events, 60 raws.

  • 51 ApPipe outputs 
  • 9 premature pod shutdown DM-41829
(no change)
2024-01-29

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28) which is d_2024_01_22 plus a pipeline config override to use goodSeeing

LATISS/prompt/output-2024-01-29/ApPipe/prompt-proto-service-latiss-00036

19 AUXTEL_PHOTO_IMAGING nextVisit events, 17 raws

  • 2 were canceled
  • 10 had ApPipe outputs 
  • 5 successfully processed but failed to export because datasetType goodSeeingDiff_longTrailedSrc not found in the repo. It was added by hand during the survey
  • 2 premature pod shutdown. Started processing but didn't finish  DM-41829


(new)
goodSeeingCoadd LATISS/runs/AUXTEL_DRP_IMAGING_20230509_20231207/w_2023_49/PREOPS-4648/20231212T162338Z

Continuing from Prompt Processing with AuxTel Imaging Survey Data 2023

  • No labels