My Hugging Face Space keep starting

This is my Hugging Face Space URL: DanbooruSearch - a Hugging Face Space by SAkizuki

I updated the code, which triggered an automatic rebuild. However, it keeps showing “starting” and has persisted for a long time. There is only one line in the logs:
===== Application Startup at 2026-03-25 13:23:12 =====
Notably, my code file contains the following snippet:
import sys
sys.stdout.reconfigure(line_buffering=True)
print(“==== [Step 1] Script execution started ====”, flush=True)
Therefore, I suspect this script is not being executed at all. I have already investigated issues regarding 0.0.0.0 and port 7860. What is causing this problem and how can it be resolved? Thank you!

1 Like

I’ve tried Factory rebuild and Restart, but they don’t seem to work.

1 Like

Here is some new information. My log output shows:

===== Application Startup at 2026-03-25 13:50:08 =====

==== [Step 1] 脚本开始执行 ====
/usr/local/lib/python3.12/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
NiceGUI ready to go on http://localhost:7860, and http://10.108.61.149:7860
[Counter] 同步成功!搜索:13793, 成功交互:10372, 复制:2385, bad_cases:4
==== [System] 开始预热计数器与引擎 ====
[Counter] 初始化完成(hf):搜索=13793, 访问=23932, 复制=2385
[platform_utils] 使用 HuggingFace Hub 模型: BAAI/bge-m3
[Engine] 云端环境 (hf),开始拉取数据文件...
[Engine] 云端数据文件拉取完毕。
[Engine] 加载缓存 (tags_embedding) ...
[Engine] 加载模型 (path=BAAI/bge-m3, device=cpu)...
Loading weights:   0%|          | 0/391 [00:00<?, ?it/s]
Loading weights: 100%|██████████| 391/391 [00:00<00:00, 15207.60it/s]
[Engine] 检查增量变更...
Current columns: ['name', 'cn_name', 'wiki', 'post_count', 'category', 'nsfw']
[Engine] 数据已是最新,无需更新。
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.744 seconds.
Prefix dict has been built successfully.
[Engine] 加载共现表 (cooccurrence_clean.parquet)...
[Engine] 共现表加载完成,27,527 个 tag,耗时 2.47s
[Engine] 初始化完成,耗时 37.70s
==== [System] 后台预热全部完成! ====

Everything looks perfectly normal. Moreover, when I access the direct link https://huggingface.co/proxy/sakizuki-danboorusearch.hf.space/, it works perfectly fine.

I read the post HF Space stuck at Starting - #2 by John6666 and followed the steps to check:

  1. Step 1 – Fix host and port in your app code: No issues with this step.
  2. Step 2 – If Docker: align app_port, EXPOSE, and CMD: The corresponding content is already in the readme and dockerfile.
  3. Step 3 – Ensure / returns something (for non-Gradio apps): Local verification is perfectly normal.
  4. Step 4 – Check for silent crashes and permission issues: Full logs have been obtained and everything looks fine.
1 Like

In fact, every time I submit code or try to rebuild, it’s always in one of the following two states:

  1. There is only one line in the log: ===== Application Startup at 2026-03-25 xx:xx:xx =====
  2. The log shows everything is normal with the program, and accessing the direct link works fine, but it keeps stuck on “starting”.
1 Like

I’m getting the same issue

2 Likes

Uh, I didn’t do anything and it suddenly worked. The only thing I did worth mentioning is:
Originally, I ran this in the local command line:

curl -Ik https://huggingface.co/proxy/sakizuki-danboorusearch.hf.space/

The system returned:

HTTP/1.1 405 Method Not Allowed

So, I added this to the code:

    @app.head('/')
    async def head_root():
        return PlainTextResponse("")

Running this command again, the system returned:

HTTP/1.1 200 OK

I don’t know if it’s because my modification was effective, or if it was just normal HF Space network fluctuation.

1 Like

I tried debugging the code for that Space, but since the issue can’t be explained by the code alone, I think both factors are to blame.


Bottom line

For SAkizuki/DanbooruSearch, the most likely explanation is two problems overlapping:

  1. A real app-side HTTP/readiness edge case: the root path was returning 405 for HEAD /, and after adding an explicit @app.head('/') handler it returned 200.
  2. A Hugging Face platform-side status/control-plane issue: in the same incident, the app logs looked normal, the direct .hf.space URL worked, and another user reported the same issue, yet the Space page still stayed on Starting. The Space is currently shown as Running. (Hugging Face Forums)

What happened in this case

The thread shows a very specific timeline. At first, the author only saw the platform startup banner in logs and suspected the script was not running. Later, the logs clearly showed the script starting, NiceGUI becoming ready, the engine loading, and full warmup finishing in about 37.70 seconds. The author also said the direct URL https://huggingface.co/proxy/sakizuki-danboorusearch.hf.space/ worked normally even while the Space page still showed Starting. Another forum user replied that they were seeing the same issue. (Hugging Face Forums)

That combination matters. When the direct app URL works and the logs show the app fully initialized, this is not well explained by “the container never started.” It points instead to a mismatch between the app actually running and Hugging Face deciding the Space is healthy enough to flip from Starting to Running. (Hugging Face Forums)

Why the usual “wrong port” answer is probably not the main cause here

For this Space, the current repo wiring is internally consistent:

  • the README metadata says sdk: docker and app_port: 7860
  • the Dockerfile exposes 7860 and runs python ui_nicegui.py
  • platform_utils.py returns 0.0.0.0, 7860 in cloud mode
  • ui.run() uses that host and port. (Hugging Face)

That matches Hugging Face’s Docker Spaces guidance, which documents app_port: 7860 as the default external port for Docker Spaces. So the classic Docker mistake of “the app is bound to the wrong host or wrong port” does not look like the best fit for the final observed state of this Space. (Hugging Face)

The strongest app-side clue: HEAD / returned 405

This is the most concrete technical clue in the thread. The author says that curl -Ik https://huggingface.co/proxy/sakizuki-danboorusearch.hf.space/ initially returned HTTP/1.1 405 Method Not Allowed. They then added:

@app.head('/')
async def head_root():
    return PlainTextResponse("")

After that, the same curl -Ik returned HTTP/1.1 200 OK. The live code now includes that handler. (Hugging Face Forums)

That matters because FastAPI has a long-standing issue where a route that handles GET may still return 405 for HEAD unless you add explicit handling. The FastAPI issue tracker describes exactly this behavior and shows manual @app.head("/") as the workaround. So the 405 was not random. It is a known framework-level pitfall. (GitHub)

What that probably means here

The thread does not prove that Hugging Face’s health check definitely uses HEAD /. But it does prove something narrower and more useful: one probeable path on the live app was returning the wrong status for HEAD requests, and fixing that changed the response from 405 to 200 right before the Space recovered. That makes the missing HEAD / support a plausible and important app-side cause for this specific incident. (Hugging Face Forums)

Why I still would not blame only the app

Because the thread also shows behavior that is hard to explain with only an app bug:

  • logs were sometimes reduced to only the platform banner line
  • logs were later completely normal
  • the direct .hf.space URL worked
  • another user reported the same issue in the same thread
  • the author says they did nothing else and it “suddenly worked.” (Hugging Face Forums)

That pattern strongly suggests that Hugging Face’s status/control layer was at least part of the story. In plain terms, the app may have been fine enough to serve traffic, while the platform UI or health registration temporarily failed to recognize that cleanly. (Hugging Face Forums)

What was probably not the main cause

Not a slow-start timeout

Hugging Face documents startup_duration_timeout as the maximum allowed startup time before a Space is marked unhealthy, with a default of 30 minutes. In the incident logs, the Space finished full warmup in about 37.70 seconds. That is nowhere near the documented default timeout. (Hugging Face)

Probably not a broken Docker launch command

The current repository has the normal Docker pieces in place: sdk: docker, app_port: 7860, EXPOSE 7860, and CMD ["python", "ui_nicegui.py"]. The app also explicitly binds to 0.0.0.0:7860 in cloud mode. That is the opposite of what you usually see in a simple launch misconfiguration. (Hugging Face)

The most likely root-cause picture

The cleanest explanation is this:

  • Primary app-side issue: root HEAD requests were not handled correctly, producing 405 instead of 200.
  • Concurrent platform-side issue: Hugging Face’s status layer appears to have been inconsistent, because the app was sometimes clearly alive while the UI still showed Starting, and another user reported the same symptom. (Hugging Face Forums)

So this was probably not “just your code” and not “just Hugging Face.” It looks more like a real readiness bug plus a platform-state glitch. (Hugging Face Forums)

Solutions for this Space

1. Keep the explicit @app.head('/') handler

This is already in the live code and should stay. It is the most defensible fix because it directly addressed an observed bad response on the live Space. (Hugging Face)

2. Add lightweight /healthz and /readyz endpoints

Right now, the app mounts a FastAPI sub-app at /api, and /api/health exists. But that health route calls DanbooruTagger.get_instance(), which means the health endpoint can itself trigger heavy initialization work. That is not ideal for a liveness check. A better design is:

  • /healthz: returns 200 immediately if the web process is alive
  • /readyz: returns whether the model/data layer is actually ready. (Hugging Face)

3. Keep Docker wiring aligned exactly as it is

For Docker Spaces, Hugging Face expects the external app_port to match what the app serves. This repo is already aligned on 7860, so this part should be preserved. (Hugging Face)

4. Only use startup_duration_timeout if startup later becomes genuinely slow

If future versions take much longer to initialize, Hugging Face supports startup_duration_timeout in the README metadata. But for this incident, startup time was not the main problem. (Hugging Face)

Practical debugging checklist for the next time this happens

If this Space gets stuck on Starting again, the fastest checks are:

  1. curl -I https://huggingface.co/proxy/sakizuki-danboorusearch.hf.space/
    If this fails or returns 405 again, that is immediately suspicious.
  2. Open the direct .hf.space URL.
    If it works but the Space page still says Starting, the platform status layer is probably lagging reality.
  3. Compare logs with actual reachability.
    If logs look bad but the direct app works, do not assume the app is dead.
  4. Check lightweight health endpoints once they exist.
    That will separate “web server is up” from “engine is ready.” (Hugging Face Forums)

Final answer

For SAkizuki/DanbooruSearch, the best explanation is:

  • the app had a real readiness/HTTP compatibility issue at the root path because HEAD / returned 405, and
  • Hugging Face likely had some platform-side status inconsistency at the same time, because the app could still run and serve normally while the Space page stayed on Starting. (Hugging Face Forums)

The Space is now Running, and the live repo already contains the main defensive fix: explicit HEAD / support. The next useful hardening step is to add separate lightweight health and readiness endpoints so that future incidents are easier to diagnose. (Hugging Face)

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.