影刀RPA实操指南_微博话题舆情监控关键词追踪与情感分类自动化

张

张建站

2026/6/13 21:28:54

10分钟阅读

影刀RPA实操指南微博话题舆情监控——关键词追踪与情感分类自动化品牌舆情、竞品监控、行业话题追踪——这些以前需要人工每天盯着刷微博用影刀RPA可以全自动完成定时采集、关键词过滤、情感分类、异常告警。微博数据采集的特点在开始之前要理解微博的反爬逻辑特点影响需要登录才能看完整内容必须维护登录态高频请求会触发人机验证采集间隔不能太短搜索结果是时间倒序的增量采集要记录最后一条的时间部分内容有折叠展示需要额外操作展开准备工作维护登录态微博的登录带滑块验证建议流程启动前手动登录一次然后把Cookie保存下来店群矩阵自动化突破运营极限importjsonimportosdefsave_weibo_cookies(browser):保存微博Cookie到本地cookiesbrowser.get_cookies()# 获取浏览器所有Cookiecookie_pathrC:\配置\weibo_cookies.jsonwithopen(cookie_path,w)asf:json.dump(cookies,f)defload_weibo_cookies(browser):加载已保存的Cookiecookie_pathrC:\配置\weibo_cookies.jsonifnotos.path.exists(cookie_path):returnFalse# 没有Cookie需要手动登录![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/4a3cea21bd8e48a59734af26d92d74f3.png#pic_center)withopen(cookie_path)asf:cookiesjson.load(f)# 先打开微博域名browser.open(https://weibo.com)# 注入Cookieforcookieincookies:browser.add_cookie(cookie)browser.refresh()# 刷新页面使Cookie生效returnTrueCookie有效期微博的Cookie通常7~30天有效。过期会自动跳转到登录页在流程里检测登录状态过期时发告警通知手动重新登录。搜索话题采集微博搜索的URL格式关键词搜索https://s.weibo.com/weibo?q{关键词}typeall1suball1timescopecustom:{开始时间}:{结束时间}page{页码} 话题搜索https://s.weibo.com/weibo?q%23{话题名}%23 ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/cd276d7a4fd744c7baab3e9903c313ce.png#pic_center)时间格式2026-06-10-0:2026-06-11-0起始时间:结束时间精确到小时关键XPath微博搜索结果页的核心元素# 每条微博的容器 //div[action-typefeed_list_item] # 博主昵称 .//a[classname] # 微博正文 .//p[node-typefeed_list_content] # 发布时间 .//p[node-typefeed_list_content_full]/../following-sibling::*//*[classfrom]//a[1] ![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/38ad2e00da4949f9884a04511e0f3e59.png#pic_center) # 转发数 .//li[action-typefeed_list_forward]//span[classwoo-number-count] # 评论数 .//li[action-typefeed_list_comment]//span[classwoo-number-count] # 点赞数 .//li[action-typefeed_list_like]//span[classwoo-number-count]注意微博前端代码更新频繁XPath可能会变。建议用contains而不是精确匹配class名。增量采集逻辑每次只采集比上次更新的数据避免重复importjsonfromdatetimeimportdatetimedefload_last_cursor(keyword):读取上次采集的最后时间cursor_filefrC:\数据\微博_游标_{keyword}.jsonifos.path.exists(cursor_file):withopen(cursor_file)asf:returnjson.load(f).get(last_time)returnNone# 首次采集没有游标defsave_cursor(keyword,last_time):保存本次采集的最后时间cursor_filefrC:\数据\微博_游标_{keyword}.jsonwithopen(cursor_file,w)asf:json.dump({last_time:last_time},f)# 采集逻辑defcollect_incremental(keyword):last_cursorload_last_cursor(keyword)new_data[]forpageinrange(1,50):# 最多翻50页[video(video-fHsA499T-1781339046397)(type-csdn)(url-https://live.csdn.net/v/embed/524992)(image-https://v-![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/971ab52bd14b44c786d2b8e853c995aa.png#pic_center)blog.csdnimg.cn/asset/b59aed2f01d4fe8583467562aaf4dcfd/cover/Cover0.jpg)(title-temu店群自动化报活动案例)]weiboscollect_page(keyword,page)new_in_page[]forweiboinweibos:# 如果时间早于上次游标说明后面都是旧数据iflast_cursorandweibo[time]last_cursor:returnnew_data# 终止翻页new_in_page.append(weibo)ifnotnew_in_page:break# 页面没有新数据new_data.extend(new_in_page)random_wait(2,5)# 保存最新的时间作为下次游标ifnew_data:save_cursor(keyword,max(w[time]forwinnew_data))returnnew_data简单情感分类不需要接AI接口用关键词规则就能做基础情感分类defclassify_sentiment(text):基于关键词的情感分类negative_words[差评,垃圾,投诉,骗,假货,售后差,踩雷,后悔,不推荐,不好,避雷,翻车,维权]positive_words[好评,推荐,喜欢,满意,不错,五星,购买了,安利,真香,值得,惊喜]neg_scoresum(1forwinnegative_wordsifwintext)pos_scoresum(1forwinpositive_wordsifwintext)ifneg_scorepos_score:return负面elifpos_scoreneg_score:return正面else:return中性进阶如果需要更准确的情感分析可以调用百度/腾讯/阿里的情感分析API影刀Python代码块里直接发HTTP请求就能用。告警阈值设置# 告警判断defcheck_alert(keyword,data):negative_countsum(1fordindataifd[sentiment]负面)total_countlen(data)iftotal_count0:returnneg_rationegative_count/total_count# 负面率超过30%或单小时负面超过50条发告警ifneg_ratio0.3ornegative_count50:send_wecom_message(f⚠️ 舆情告警\nf关键词{keyword}\nf新增{total_count}条负面{negative_count}条{neg_ratio:.0%}\nf请及时关注)#影刀RPA #RPA自动化 #舆情监控 #微博采集 #品牌监控作者林焱本文为《影刀RPA学习手册》系列文章之一内容源于实操经验的整理与分享。

从‘拍糊了’到轮廓清晰：用Python+OpenCV Canny给老照片/手机模糊图做智能边缘增强（完整代码）

从‘拍糊了’到轮廓清晰：用PythonOpenCV Canny给老照片/手机模糊图做智能边缘增强（完整代码）翻箱倒柜找出几张泛黄的老照片，或是手机里拍糊的瞬间，总让人遗憾——那些模糊的轮廓是否还有救？今天我们就用Pyt…...

2026/6/13 21:23:21 阅读更多 →

打破语言壁垒：Translumo如何成为你的实时屏幕翻译助手

打破语言壁垒：Translumo如何成为你的实时屏幕翻译助手【免费下载链接】Translumo Advanced real-time screen translator for games, hardcoded subtitles in videos, static text and etc. 项目地址: https://gitcode.com/gh_mirrors/tr/Translumo 你是否曾…...

2026/6/13 21:22:23 阅读更多 →

Chat Completions、Responses API 与 Claude Messages API：别只看名字，要看输入结构

这三个接口都能让模型“回一句话”，但它们的设计重心并不一样。Chat Completions 更像经典聊天入口；Responses API 更像 OpenAI 新一代统一入口；Claude Messages API 则坚持清晰的 message history 组织方式，并强调 stateless 调用…...

2026/6/13 21:21:01 阅读更多 →