LUDD Status

Degraded availability for most services (Updated 2020-02-18)

Visit www.ludd.ltu.se for our website

Services

kerberos/ldap auth running. OK
backup systems backing up. OK

disk on backup server somewhat full

dns servers naming things. OK
git.ludd.ltu.se degraded
ircshell.ludd.ltu.se chatting. OK
mail.ludd.ltu.se degraded
core network pushing packets. OK
member servers running. OK

thinlinc up, ssh up

Userdata fileservers degraded. Crashing
userwww.ludd.ltu.se degraded
vortex.ludd.ltu.se membership system. degraded

Tickets

Crashing storage servers causing service distruptions
2020-02-07 00:00:00 +0100

status Investigating
scope Mail, userdata, web services

Description:

An issue with our storage servers are causing them to crash and hang, requiring physical intervention.

Impact:

The servers serve maildirs, userdata, userwww, as well as a few legacy virtual machines handling mail connections, userwww and more.

Update:

* 20200207 00:00 UTC+1
Userdata server starts crashing once a day.
---

* 20171208 18:00 UTC+1
Same symptom on gfs server serving VM data. Manual intervention required every time due to broken Out-of-bounds connection.
---

* 20171217 12:00 UTC+1
Crashing after server hardware changes, problem identified on 12-13 storage servers. Believed to be thermal issues OR kernel incompatability.
---

Downtime LCNet all servers
2017-12-14 18:00:00 +0100

status Scheduled
scope LCNet
end_date 2017-14-17 22:00:00 +0100

Description:

All racked servers will be moved and recabled between Thursday and Sunday.
Reason for this is to prepare the data center for new hardware.
Shelved LCNet servers should not be affected, but might.

Impact:

Downtime for all LCNet servers.
Risk for downtime on LCNet non-racked servers.

Progress:

Downtime lcnet non-racked servers
2017-12-08 19:00:00 +0100

status Done
scope LCNet/non-racked servers

Description:

All non-racked servers will be moved to a new shelf, and power and network will be recabled.
Reason for this is to prepare the data center for new hardware.

Impact:

Downtime for all non-racked lcnet servers

Progress:

20171208 19:00
  Move started.
  
20171211 20:00
  Done! All non-racked servers relocated. Rack servers without rackmounts moved to bottom of lcnet rack.

Expired certificate on weblogon.ltu.se
2017-12-02 00:59:00 +0100

status Fixed
scope New membership

Description:

The certificate for weblogon.ltu.se has expired.

Impact:

This causes issues with verifying new members.

Update:

* 20171202 00:59 UTC+1
weblogon.ltu.se certificate expired.
---

* 20171202 18:00 UTC+1
Certificate was renewed.
---