Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: 2024-04-18

...

day_obs of data collectionObserving run

Tag of prompt_processing or prompt-service

Output collection in /repo/embargo

Notestemplates collection (as chained in LATISS/templates)



LATISS/prompt/output-<day_obs>















2024-04-19


2.6.0 (w_2024_16)



(no change)
2024-04-18


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-18

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 6 successful runs 
  • All others failed Failed with postgres error like the previous night.  The failure happened on one specific pod which was ~1 week old. It's also the same troublesome pod as last night. The pod was killed manually, and then some others pods were able to process normally. 
  • 6 successful runs (after the manual intervention) 
(no change)
2024-04-17


2.5.0 (w_2024_14)

N/A

99 AUXTEL_PHOTO_IMAGING nextVisit events, 1 canceled, 97 raws

All failed with Postgres error, possibly from a underlying network error.  

(no change)
2024-04-16


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-16

LATISS/prompt/output-2024-04-16/ApPipe-noForced/prompt-proto-service-latiss-00082

Starting on 2024-04-11, have 30 pods running continuously to test if a Knative scale down bug is the source of the dropped pod issue

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws, 96 successful runs

(no change)
2024-04-10


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-10

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 93 successful runs
  • 3 premature pod shutdown
(no change)
2024-04-09


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-09

LATISS/prompt/output-2024-04-09/ApPipe-noForced/prompt-proto-service-latiss-00079

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90 successful runs
  • 2 failed calibrateImage: DM-43588, DM-43777 
  • 4 premature pod shutdown
(no change)
2024-04-08


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-08

LATISS/prompt/output-2024-04-08/ApPipe-noForced/prompt-proto-service-latiss-00079

56 AUXTEL_PHOTO_IMAGING nextVisit events, 56 raws

  • 51 successful runs

  • 1 failed calibrateImage
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43588
  • 4 premature pod shutdown

(no change)
2024-04-05


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-05


LATISS/prompt/output-2024-04-05/ApPipe-noForced/prompt-proto-service-latiss-00079

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90 successful runs
  • 6 premature pod shutdown
(no change)
2024-04-04


2.5.0 (w_2024_14)

LATISS/prompt/output-2024-04-04


LATISS/prompt/output-2024-04-04/ApPipe-noForced/prompt-proto-service-latiss-00079

Dan added a patch of the readiness probe settings for DM-41829

96 AUXTEL_PHOTO_IMAGING nextVisit, 96 raws

  • 84 successful runs
  • 1 connection failed, no prompt-service
  • 5 timed out waiting for image
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-39022
  • 6 premature pod shutdown
(no change)
2024-04-03


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-03

98 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 1 was canceled
  • 1 no raw image
  • 85 successful runs
  • 1 failed calibrateImage DM-43588
  • 6 Timed out waiting for image
  • 4 started running pipeline but got premature pod shutdown
(no change)
2024-04-02


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-02

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 83 successful runs
  • 1 connection refused
  • 2 timed out DM-39022 , DM-42825
  • 10 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-04-01


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-04-01

LATISS/prompt/output-2024-04-01/ApPipe-noForced/prompt-proto-service-latiss-00076

Started to use ApPipe-noForced.yaml 


76 AUXTEL_PHOTO_IMAGING nextVisit events, 76 raws
  • 72 successful runs
  • 3 hit broker communication failure. 
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43590
  • 1 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829


(no change)
2024-03-29


2.4.0 (d_2024_03_29) 
(454de5c9)

LATISS/prompt/output-2024-03-29

LATISS/prompt/output-2024-03-29/ApPipe/prompt-proto-service-latiss-00075

Isolated node test continues; LATISS storage allocation reduced to 20 GiB to support more pods.


96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 72 successful runs
  • 2 failed calibrateImage with partial outputs: DM-43593 and  MeasureApCorrError
  • 4 hit broker communication failure. 
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43590
    . Two with partial outputs, two without outputs.
  • 1 tcp connection reset by peer. No prompt-service. 
  • 6 time out/pod ready after image arrival  DM-39022 , DM-42825
  • 10 premature pod shutdown 
  • 1 hit 900 sec timeout  
(no change)
2024-03-28


2.3.0 (d_2024_03_26) 

(6a646b51)

LATISS/prompt/output-2024-03-28

LATISS/prompt/output-2024-03-28/ApPipe/prompt-proto-service-latiss-00060

Ran on isolated nodes to rule out resource contention as an issue. Alert prod is on. 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 53 successful runs
  • 36 timeouts because the isolated mode couldn't allocate enough pods
  • 4 failed calibrateImage in denormalizeMatches
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43588
  • 1 failed calibrateImage with MeasureApCorrError
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43306
  • 1 failed diaPipe with broker transport failure
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43590
  • 1 worker timed out
(no change)
2024-03-27


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-27

LATISS/prompt/output-2024-03-27/ApPipe/prompt-proto-service-latiss-00058

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

Around 20:45 PT, a k8s node that went offline with pods stuck in terminating, affecting APDB Postgres, the knative controller, a running prompt-processing pod, etc. 

  • 81 successful runs
  • 5 premature pod shutdown 
  • 3 failed in DIA with problems connecting to APDB
  • 7 not processed because fan out was stuck 
(no change)
2024-03-26


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-26                                                     LATISS/prompt/output-2024-03-26/ApPipe-noForced/prompt-proto-service-latiss-00057

ApPipe-noForced.yaml was used 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 87 successful runs
  • 1 failed calibrateImage with MeasureApCorrError
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43306
  • 8 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-25


2.2.2 (w_2024_12) (87ceee2)

LATISS/prompt/output-2024-03-25

LATISS/prompt/output-2024-03-25/ApPipe/prompt-proto-service-latiss-00056

On 2024-03-22, kafka and the knative controller are moved off an Intel NIC where they were on. 

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 95 outputs
  • 1 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-21


2.2.0 (w_2024_12)

(a0d41eae6b)

LATISS/prompt/output-2024-03-21/ApPipe/prompt-proto-service-latiss-00053

96 AUXTEL_PHOTO_IMAGING nextVisit, 96 raws

  • 94 successful runs
  • 2 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)

2024-03-19



2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-19/ApPipe/prompt-proto-service-latiss-0

On 2024-03-18, prompt service is moved off the intel NIC nodes that could cause dropped pods


13 AUXTEL_PHOTO_IMAGING nextVisit events, 11 raws

  • 2 were canceled
  • 9 successful runs
  • 2 partial outputs: 1 DM-43247/DM-43277, 1 "Failed to determine psfex psf: too few good stars." DM-43777
(no change)
2024-03-14


2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-14/ApPipe/prompt-proto-service-latiss-00051

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 94 successful runs
  • 2 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-13

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-13/ApPipe/prompt-proto-service-latiss-00050

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 91  successful runs
  • 1 failed calibrateImage
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43248
  • 4  premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-12

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-12/ApPipe/prompt-proto-service-latiss-00049

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 66 successful runs
  • 24 failed calibrateImage with MeasureApCorrError
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43306
  • 1 failed calibrateImage with
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43247
  • 5 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-08

N/A

2.1.0 (w_2024_10)

LATISS/prompt/output-2024-03-08/ApPipe/prompt-proto-service-latiss-00049

97 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 1 was canceled
  • 91 successful runs
  • 5 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-07

N/A

2.1.0 (02348c99f) (w_2024_10)

LATISS/prompt/output-2024-03-07/ApPipe/prompt-proto-service-latiss-00049

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 90  successful runs, with the new "initial_pvi"
  • 3 partial outputs, calibrateImage failed.
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43247
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-43248
  • 3 premature pod shutdown
    Jira
    serverJira Cloud
    serverId6dd774b6-00ac-3344-8265-fe491188c468
    keyDM-41829
(no change)
2024-03-06

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-06/ApPipe/prompt-proto-service-latiss-00041

65 AUXTEL_PHOTO_IMAGING nextVisit events, 64 raws

  • 1 was canceled 
  • 63 successful runs 
  • 1 premature pod shutdown DM-41829.
(no change)
2024-03-05

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-05/ApPipe/prompt-proto-service-latiss-00041

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 96 successful runs 
(no change)
2024-03-04

N/A

w_2024_08 (1e350bec)

N/A

100 raws, nextVisit did not arrive USDF on time

(no change)
2024-03-03

N/A

w_2024_08 (1e350bec)

N/A

28 raws, nextVisit did not arrive USDF on time

(no change)
2024-03-02

N/A

w_2024_08 (1e350bec)

N/A

76 raws, nextVisit did not arrive USDF on time 

(no change)
2024-03-01

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-03-03/ApPipe/prompt-proto-service-latiss-00040

96 raws, nextVisit did not arrive USDF on time

One exposure was processed on 2024-03-03 ~15:46PT when EFD data and pp kafka came back online. 

(no change)
2024-02-29

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-29/ApPipe/prompt-proto-service-latiss-00040

96 AUXTEL_PHOTO_IMAGING nextVisit events, 96 raws

  • 93 have pipeline outputs 
    • 24 successful runs
    • 69 partial outputs, "WCS fit failed" in CalibrateTask DM-43160
  • 3 premature pod shutdown DM-41829.
(no change)
2024-02-27

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-27/ApPipe/prompt-proto-service-latiss-00040

20 AUXTEL_PHOTO_IMAGING nextVisit events, 20 raws

  • 13 successful runs
  • 7 premature pod shutdown DM-41829.
(no change)
2024-02-26

N/A

w_2024_08 (1e350bec)

LATISS/prompt/output-2024-02-26/ApPipe/prompt-proto-service-latiss-00040

32 AUXTEL_PHOTO_IMAGING nextVisit events, 24 raws

  • 6 groups were canceled
  • 2 groups never arrived; likely failed. 
  • 4 ApPipe outputs
  • 18 partial outputs (postISRCCD). Failed in characterizeImage
  • 2 premature pod shutdown DM-41829.  The knative scale down delay was extended to 5min on 02-23. But network reset can still trip the pods. 
(no change)

2024-02-19 

2024-02-20

N/A

w_2024_05 (54e20b0)

N/A

AUXTEL_PHOTO_IMAGING was run but no nextVisit nor data were transferred to USDF: IHS-7703

(no change)
2024-02-13

SUMMIT-8516

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-13/ApPipe/prompt-proto-service-latiss-00039    

LATISS/prompt/output-2024-02-13/SingleFrame/prompt-proto-service-latiss-00039

Started using the "latiss_prompt" postgres user to access /repo/embargo

43 AUXTEL_PHOTO_IMAGING nextVisit events, 41 raws

  • 2 were canceled
  • 35 successful runs with either ApPipe or single frame products
  • 6 premature pod shutdown DM-41829


(no change)
2024-02-12

SUMMIT-8516

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-12/ApPipe/prompt-proto-service-latiss-00038

LATISS/prompt/output-2024-02-12/SingleFrame/prompt-proto-service-latiss-00038


112 AUXTEL_PHOTO_IMAGING nextVisit events, 105 raws

  • 6 were canceled
  • 1 image never arrived
  • 1  file arrived after timeout
  • 38 ApPipe products
  • 47 single frame products
  • 2 pipeline failure with partial products
  • 17 premature shutdown DM-41829

(no change)
2024-02-08

N/A

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-08/ApPipe/prompt-proto-service-latiss-00038

64 AUXTEL_PHOTO_IMAGING nextVisit events, 63 raws

  • 1 image never arrived
  • 45 ApPipe outputs
  • 6 timed out waiting for image DM-39022, DM-42825
  • 12 premature pod shutdown DM-41829
(no change)
2024-02-07

N/A

w_2024_05 (54e20b0)

LATISS/prompt/output-2024-02-07/ApPipe/prompt-proto-service-latiss-00038

100 AUXTEL_PHOTO_IMAGING nextVisit events, 99 raws

  • 1 image never arrived
  • 90 ApPipe outputs
  • 9 premature pod shutdown DM-41829
(no change)
2024-02-06

N/A

d_2024_01_31 (7fbe0199)

LATISS/prompt/output-2024-02-06/ApPipe/prompt-proto-service-latiss-00037

94 AUXTEL_PHOTO_IMAGING nextVisit events, 89 raws

  • 3 were canceled
  • 2 images never arrived
  • 83 ApPipe outputs
  • 8 premature pod shutdown DM-41829 (including 2 while waiting for a canceled visit)
(no change)
2024-02-05

N/A

d_2024_01_31 (7fbe0199)

N/A

16 AUXTEL_PHOTO_IMAGING nextVisit events, 8 raws

  • 6 were canceled
  • 2 images never arrived
  • 8 had mismatched APDB schema (DM-42798)
(no change)
2024-01-31

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28)

N/A1 event was sent and canceled. No image. (no change)

2024-01-30

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28)

LATISS/prompt/output-2024-01-30/ApPipe/prompt-proto-service-latiss-00036

k8s vCluster was upgraded to 1.26.9
Knative was upgraded to 1.11.0 to 1.12.3 

60  AUXTEL_PHOTO_IMAGING nextVisit events, 60 raws.

  • 51 ApPipe outputs 
  • 9 premature pod shutdown DM-41829
(no change)
2024-01-29

SUMMIT-8438

DM-42710-d_2024_01_22
(cf97df28) which is d_2024_01_22 plus a pipeline config override to use goodSeeing

LATISS/prompt/output-2024-01-29/ApPipe/prompt-proto-service-latiss-00036

19 AUXTEL_PHOTO_IMAGING nextVisit events, 17 raws

  • 2 were canceled
  • 10 had ApPipe outputs 
  • 5 successfully processed but failed to export because datasetType goodSeeingDiff_longTrailedSrc not found in the repo. It was added by hand during the survey
  • 2 premature pod shutdown. Started processing but didn't finish  DM-41829


(new)
goodSeeingCoadd LATISS/runs/AUXTEL_DRP_IMAGING_20230509_20231207/w_2023_49/PREOPS-4648/20231212T162338Z

Continuing from Prompt Processing with AuxTel Imaging Survey Data 2023

...