A while ago, I integrated TianliGPT into my articles to auto-generate a “TL;DR” summary block. TianliGPT offers a super-simple drop-in snippet, but it’s focused only on summarization and related-article recommendations; extending it further is pretty limited. So I recently ditched TianliGPT and switched to Moonshot AI for both summarization and extra features.
Defining the requirements
Besides generating a summary, we also want to pose relevant questions (and their answers) based on the article. Clicking a question reveals its answer—something like the image below:

Preview of the article-summary module
To meet that need, the model must return JSON like this for the front-end to consume:
So we craft the following prompt:
NoteA prompt is the information or instruction given to the model to guide its response. An effective prompt needs clarity, relevance, brevity, context, and instructiveness.
Since Kimi and Moonshot AI share the same backbone, we can test with Kimi to preview what the Moonshot API will return (and save some money). After chatting with Kimi using the prompt above, the result looks like this:

Model chat result
Talking to the model
With the “what” settled, we tackle the “how”. Moonshot’s docs provide Python and Node.js samples; here we’ll do it in PHP.
The official Chat Completions endpoint is https://api.moonshot.cn/v1/chat/completions. Headers and body look like:
model: pickmoonshot-v1-8k,moonshot-v1-32k, ormoonshot-v1-128k.messages: array of chat turns. Roles aresystem,user, orassistant.temperature: sampling temperature, recommended0.3.
We wrap this in a MoonshotAPI class:
[!NOTE] If you simply tell Kimi “please output JSON”, it will—sort of. You’ll usually get extra explanatory text around the JSON. Enabling JSON Mode (
'response_format' => ["type" => "json_object"]) forces a clean, parseable JSON document.
The only tricky parameter is $messages, so we add a helper:
We put our prompt in system and the article in the first user message. The messages array is chronological: system → user → assistant, keeping context intact.
The model returns JSON like this; we care about the choices array:
In our flow choices has exactly one element (hence the hard-coded $result['choices'][0] below). finish_reason tells us whether the answer is complete (stop). content holds the actual reply.
Now glue it together:
That’s it—call it and you’ll get the model’s reply ✌️. The front-end can render it however it likes; we’ll skip that part.
Extra: handling very long articles
For some articles you may hit:
The combined prompt + response tokens exceed the model’s limit (default here is moonshot-v1-8k). We need to pick a bigger model. The docs give sample code for choosing the right model; we port it to PHP:
Inside Moonshot, when we get invalid_request_error, we retry with the bigger model: