Merge pull request #1 from soimort/develop

This commit is contained in:
const-zhou 2020-12-09 15:52:38 +08:00 committed by GitHub
commit c3fd8d51c0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
104 changed files with 7461 additions and 3154 deletions

View File

@ -1,39 +0,0 @@
Please make sure these boxes are checked before submitting your issue thank you!
- [ ] You can actually watch the video in your browser or mobile application, but not download them with `you-get`.
- [ ] Your `you-get` is up-to-date.
- [ ] I have read <https://github.com/soimort/you-get/wiki/FAQ> and tried to do so.
- [ ] The issue is not yet reported on <https://github.com/soimort/you-get/issues> or <https://github.com/soimort/you-get/wiki/Known-Bugs>. If so, please add your comments under the existing issue.
- [ ] The issue (or question) is really about `you-get`, not about some other code or project.
Run the command with the `--debug` option, and paste the full output inside the fences:
```
[PASTE IN ME]
```
If there's anything else you would like to say (e.g. in case your issue is not about downloading a specific video; it might as well be a general discussion or proposal for a new feature), fill in the box below; otherwise, you may want to post an emoji or meme instead:
> [WRITE SOMETHING]
> [OR HAVE SOME :icecream:!]
汉语翻译最终日期2016年02月26日
在提交前,请确保您已经检查了以下内容!
- [ ] 你可以在浏览器或移动端中观看视频,但不能使用`you-get`下载.
- [ ] 您的`you-get`为最新版.
- [ ] 我已经阅读并按 <https://github.com/soimort/you-get/wiki/FAQ> 中的指引进行了操作.
- [ ] 您的问题没有在<https://github.com/soimort/you-get/issues> , <https://github.com/soimort/you-get/wiki/FAQ><https://github.com/soimort/you-get/wiki/Known-Bugs> 报告否则请在原有issue下报告.
- [ ] 本问题确实关于`you-get`, 而不是其他项目.
请使用`--debug`运行,并将输出粘贴在下面:
```
[在这里粘贴完整日志]
```
如果您有其他附言,例如问题只在某个视频发生,或者是一般性讨论或者提出新功能,请在下面添加;或者您可以卖个萌:
> [您的内容]
> [舔 :icecream:!]

View File

@ -1,48 +0,0 @@
**(PLEASE DELETE ALL THESE AFTER READING)**
Thank you for the pull request! `you-get` is a growing open source project, which would not have been possible without contributors like you.
Here are some simple rules to follow, please recheck them before sending the pull request:
- [ ] If you want to propose two or more unrelated patches, please open separate pull requests for them, instead of one;
- [ ] All pull requests should be based upon the latest `develop` branch;
- [ ] Name your branch (from which you will send the pull request) properly; use a meaningful name like `add-this-shining-feature` rather than just `develop`;
- [ ] All commit messages, as well as comments in code, should be written in understandable English.
As a contributor, you must be aware that
- [ ] You agree to contribute your code to this project, under the terms of the MIT license, so that any person may freely use or redistribute them; of course, you will still reserve the copyright for your own authorship.
- [ ] You may not contribute any code not authored by yourself, unless they are licensed under either public domain or the MIT license, literally.
Not all pull requests can eventually be merged. I consider merged / unmerged patches as equally important for the community: as long as you think a patch would be helpful, someone else might find it helpful, too, therefore they could take your fork and benefit in some way. In any case, I would like to thank you in advance for taking your time to contribute to this project.
Cheers,
Mort
**(PLEASE REPLACE ALL ABOVE WITH A DETAILED DESCRIPTION OF YOUR PULL REQUEST)**
汉语翻译最后日期2016年02月26日
**(阅读后请删除所有内容)**
感谢您的pull request! `you-get`是稳健成长的开源项目,感谢您的贡献.
以下简单检查项目望您复查:
- [ ] 如果您预计提出两个或更多不相关补丁请为每个使用不同的pull requests而不是单一;
- [ ] 所有的pull requests应基于最新的`develop`分支;
- [ ] 您预计提出pull requests的分支应有有意义名称例如`add-this-shining-feature`而不是`develop`;
- [ ] 所有的提交信息与代码中注释应使用可理解的英语.
作为贡献者,您需要知悉
- [ ] 您同意在MIT协议下贡献代码以便任何人自由使用或分发;当然,你仍旧保留代码的著作权
- [ ] 你不得贡献非自己编写的代码除非其属于公有领域或使用MIT协议.
不是所有的pull requests都会被合并,然而我认为合并/不合并的补丁一样重要如果您认为补丁重要其他人也有可能这么认为那么他们可以从你的fork中提取工作并获益。无论如何感谢您费心对本项目贡献.
祝好,
Mort
**(请将本内容完整替换为PULL REQUEST的详细内容)**

39
.github/workflows/python-package.yml vendored Normal file
View File

@ -0,0 +1,39 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: develop
on:
push:
branches: [ develop ]
pull_request:
branches: [ develop ]
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.5, 3.6, 3.7, 3.8, pypy3]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with unittest
run: |
make test

8
.gitignore vendored
View File

@ -81,3 +81,11 @@ _*
*.xml
/.env
/.idea
*.m4a
*.DS_Store
*.txt
*.zip
.vscode

View File

@ -1,18 +0,0 @@
# https://travis-ci.org/soimort/you-get
language: python
python:
- "3.2"
- "3.3"
- "3.4"
- "3.5"
- "nightly"
- "pypy3"
script: make test
sudo: false
notifications:
webhooks:
urls:
- https://webhooks.gitter.im/e/43cd57826e88ed8f2152
on_success: change # options: [always|never|change] default: always
on_failure: always # options: [always|never|change] default: always
on_start: never # options: [always|never|change] default: always

27
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,27 @@
# How to Report an Issue
If you would like to report a problem you find when using `you-get`, please open a [Pull Request](https://github.com/soimort/you-get/pulls), which should include:
1. A detailed description of the encountered problem;
2. At least one commit, addressing the problem through some unit test(s).
* Examples of good commits: [#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
PRs that fail to meet the above criteria may be closed summarily with no further action.
A valid PR will remain open until its addressed problem is fixed.
# 如何汇报问题
为了防止对 GitHub Issues 的滥用,本项目不接受一般的 Issue。
如您在使用 `you-get` 的过程中发现任何问题,请开启一个 [Pull Request](https://github.com/soimort/you-get/pulls)。该 PR 应当包含:
1. 详细的问题描述;
2. 至少一个 commit其内容是**与问题相关的**单元测试。**不要通过随意修改无关文件的方式来提交 PR**
* 有效的 commit 示例:[#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
不符合以上条件的 PR 可能被直接关闭。
有效的 PR 将会被一直保留,直至相应的问题得以修复。

View File

@ -1,15 +1,15 @@
==============================================
This is a copy of the MIT license.
==============================================
Copyright (C) 2012, 2013, 2014, 2015, 2016 Mort Yao <mort.yao@gmail.com>
Copyright (C) 2012 Boyu Guo <iambus@gmail.com>
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
Copyright (c) 2012-2020 Mort Yao <mort.yao@gmail.com> and other contributors
(https://github.com/soimort/you-get/graphs/contributors)
Copyright (c) 2012 Boyu Guo <iambus@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

173
README.md
View File

@ -1,22 +1,32 @@
# You-Get
[![Build Status](https://github.com/soimort/you-get/workflows/develop/badge.svg)](https://github.com/soimort/you-get/actions)
[![PyPI version](https://img.shields.io/pypi/v/you-get.svg)](https://pypi.python.org/pypi/you-get/)
[![Build Status](https://travis-ci.org/soimort/you-get.svg)](https://travis-ci.org/soimort/you-get)
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
**NOTICE: Read [this](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) if you are looking for the conventional "Issues" tab.**
---
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
Here's how you use `you-get` to download a video from [this web page](http://www.fsf.org/blogs/rms/20140407-geneva-tedx-talk-free-software-free-society):
Here's how you use `you-get` to download a video from [YouTube](https://www.youtube.com/watch?v=jNQXAC9IVRw):
```console
$ you-get http://www.fsf.org/blogs/rms/20140407-geneva-tedx-talk-free-software-free-society
Site: fsf.org
Title: TEDxGE2014_Stallman05_LQ
Type: WebM video (video/webm)
Size: 27.12 MiB (28435804 Bytes)
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
stream:
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: you-get --itag=43 [URL]
Downloading TEDxGE2014_Stallman05_LQ.webm ...
100.0% ( 27.1/27.1 MB) ├████████████████████████████████████████┤[1/1] 12 MB/s
Downloading Me at the zoo.webm ...
100% ( 0.5/ 0.5MB) ├██████████████████████████████████┤[1/1] 6 MB/s
Saving Me at the zoo.en.srt ... Done.
```
And here's why you might want to use it:
@ -43,10 +53,10 @@ Are you a Python programmer? Then check out [the source](https://github.com/soim
### Prerequisites
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
The following dependencies are necessary:
* **[Python 3](https://www.python.org/downloads/)**
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
* **[Python](https://www.python.org/downloads/)** 3.2 or above
* **[FFmpeg](https://www.ffmpeg.org/)** 1.0 or above
* (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/)
### Option 1: Install via pip
@ -55,17 +65,13 @@ The official release of `you-get` is distributed on [PyPI](https://pypi.python.o
$ pip3 install you-get
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen)
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) (for Zsh users)
Add the following line to your `.zshrc`:
antigen bundle soimort/you-get
### Option 3: Use a pre-built package (Windows only)
Download the `exe` (standalone) or `7z` (all dependencies included) from: <https://github.com/soimort/you-get/releases/latest>.
### Option 4: Download from GitHub
### Option 3: Download from GitHub
You may either download the [stable](https://github.com/soimort/you-get/archive/master.zip) (identical with the latest release on PyPI) or the [develop](https://github.com/soimort/you-get/archive/develop.zip) (more hotfixes, unstable features) branch of `you-get`. Unzip it, and put the directory containing the `you-get` script into your `PATH`.
@ -83,7 +89,7 @@ $ python3 setup.py install --user
to install `you-get` to a permanent path.
### Option 5: Git clone
### Option 4: Git clone
This is the recommended way for all developers, even if you don't often code in Python.
@ -93,13 +99,7 @@ $ git clone git://github.com/soimort/you-get.git
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
```
> choco install you-get
```
### Option 7: Homebrew (Mac only)
### Option 5: Homebrew (Mac only)
You can install `you-get` easily via:
@ -107,9 +107,17 @@ You can install `you-get` easily via:
$ brew install you-get
```
### Option 6: pkg (FreeBSD only)
You can install `you-get` easily via:
```
# pkg install you-get
```
### Shell completion
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](contrib/completion). Please consult your shell's manual for how to take advantage of them.
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](https://github.com/soimort/you-get/tree/develop/contrib/completion). Please consult your shell's manual for how to take advantage of them.
## Upgrading
@ -125,12 +133,6 @@ or download the latest release via:
$ you-get https://github.com/soimort/you-get/archive/master.zip
```
or use [chocolatey package manager](https://chocolatey.org):
```
> choco upgrade you-get
```
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
```
@ -148,22 +150,54 @@ $ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
streams: # Available quality and codecs
[ DASH ] ____________________________________
- itag: 242
container: webm
quality: 320x240
size: 0.6 MiB (618358 bytes)
# download-with: you-get --itag=242 [URL]
- itag: 395
container: mp4
quality: 320x240
size: 0.5 MiB (550743 bytes)
# download-with: you-get --itag=395 [URL]
- itag: 133
container: mp4
quality: 320x240
size: 0.5 MiB (498558 bytes)
# download-with: you-get --itag=133 [URL]
- itag: 278
container: webm
quality: 192x144
size: 0.4 MiB (392857 bytes)
# download-with: you-get --itag=278 [URL]
- itag: 160
container: mp4
quality: 192x144
size: 0.4 MiB (370882 bytes)
# download-with: you-get --itag=160 [URL]
- itag: 394
container: mp4
quality: 192x144
size: 0.4 MiB (367261 bytes)
# download-with: you-get --itag=394 [URL]
[ DEFAULT ] _________________________________
- itag: 43
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
size: 0.5 MiB (568748 bytes)
# download-with: you-get --itag=43 [URL]
- itag: 18
container: mp4
quality: medium
# download-with: you-get --itag=18 [URL]
- itag: 5
container: flv
quality: small
# download-with: you-get --itag=5 [URL]
# download-with: you-get --itag=18 [URL]
- itag: 36
container: 3gp
@ -176,21 +210,22 @@ streams: # Available quality and codecs
# download-with: you-get --itag=17 [URL]
```
The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it:
By default, the one on the top is the one you will get. If that looks cool to you, download it:
```
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
site: YouTube
title: Me at the zoo
stream:
- itag: 43
- itag: 242
container: webm
quality: medium
size: 0.5 MiB (564215 bytes)
# download-with: you-get --itag=43 [URL]
quality: 320x240
size: 0.6 MiB (618358 bytes)
# download-with: you-get --itag=242 [URL]
Downloading zoo.webm ...
100.0% ( 0.5/0.5 MB) ├████████████████████████████████████████┤[1/1] 7 MB/s
Downloading Me at the zoo.webm ...
100% ( 0.6/ 0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2] 2 MB/s
Merging video parts... Merged into Me at the zoo.webm
Saving Me at the zoo.en.srt ... Done.
```
@ -292,7 +327,7 @@ However, the system proxy setting (i.e. the environment variable `http_proxy`) i
### Watch a video
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it:
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mpv` or `vlc`, instead of downloading it:
```
$ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
@ -333,33 +368,29 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| VK | <http://vk.com/> |✓|✓| |
| Vine | <https://vine.co/> |✓| | |
| Vimeo | <https://vimeo.com/> |✓| | |
| Vidto | <http://vidto.me/> |✓| | |
| Videomega | <http://videomega.tv/> |✓| | |
| Veoh | <http://www.veoh.com/> |✓| | |
| **Tumblr** | <https://www.tumblr.com/> |✓|✓|✓|
| TED | <http://www.ted.com/> |✓| | |
| SoundCloud | <https://soundcloud.com/> | | |✓|
| SHOWROOM | <https://www.showroom-live.com/> |✓| | |
| Pinterest | <https://www.pinterest.com/> | |✓| |
| MusicPlayOn | <http://en.musicplayon.com/> |✓| | |
| MTV81 | <http://www.mtv81.com/> |✓| | |
| Mixcloud | <https://www.mixcloud.com/> | | |✓|
| Metacafe | <http://www.metacafe.com/> |✓| | |
| Magisto | <http://www.magisto.com/> |✓| | |
| Khan Academy | <https://www.khanacademy.org/> |✓| | |
| JPopsuki TV | <http://www.jpopsuki.tv/> |✓| | |
| Internet Archive | <https://archive.org/> |✓| | |
| **Instagram** | <https://instagram.com/> |✓|✓| |
| InfoQ | <http://www.infoq.com/presentations/> |✓| | |
| Imgur | <http://imgur.com/> | |✓| |
| Heavy Music Archive | <http://www.heavy-music.ru/> | | |✓|
| **Google+** | <https://plus.google.com/> |✓|✓| |
| Freesound | <http://www.freesound.org/> | | |✓|
| Flickr | <https://www.flickr.com/> |✓|✓| |
| FC2 Video | <http://video.fc2.com/> |✓| | |
| Facebook | <https://www.facebook.com/> |✓| | |
| eHow | <http://www.ehow.com/> |✓| | |
| Dailymotion | <http://www.dailymotion.com/> |✓| | |
| Coub | <http://coub.com/> |✓| | |
| CBS | <http://www.cbs.com/> |✓| | |
| Bandcamp | <http://bandcamp.com/> | | |✓|
| AliveThai | <http://alive.in.th/> |✓| | |
@ -368,14 +399,12 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| **niconico<br/>ニコニコ動画** | <http://www.nicovideo.jp/> |✓| | |
| **163<br/>网易视频<br/>网易云音乐** | <http://v.163.com/><br/><http://music.163.com/> |✓| |✓|
| 56网 | <http://www.56.com/> |✓| | |
| **AcFun** | <http://www.acfun.tv/> |✓| | |
| **AcFun** | <http://www.acfun.cn/> |✓| | |
| **Baidu<br/>百度贴吧** | <http://tieba.baidu.com/> |✓|✓| |
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
| Dilidili | <http://www.dilidili.com/> |✓| | |
| 豆瓣 | <http://www.douban.com/> | | |✓|
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓|✓|✓|
| 豆瓣 | <http://www.douban.com/> |✓| |✓|
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
| 凤凰视频 | <http://v.ifeng.com/> |✓| | |
| 风行网 | <http://www.fun.tv/> |✓| | |
| iQIYI<br/>爱奇艺 | <http://www.iqiyi.com/> |✓| | |
@ -387,26 +416,32 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
| 荔枝FM | <http://www.lizhi.fm/> | | |✓|
| 秒拍 | <http://www.miaopai.com/> |✓| | |
| MioMio弹幕网 | <http://www.miomio.tv/> |✓| | |
| MissEvan<br/>猫耳FM | <http://www.missevan.com/> | | |✓|
| 痞客邦 | <https://www.pixnet.net/> |✓| | |
| PPTV聚力 | <http://www.pptv.com/> |✓| | |
| 齐鲁网 | <http://v.iqilu.com/> |✓| | |
| QQ<br/>腾讯视频 | <http://v.qq.com/> |✓| | |
| 企鹅直播 | <http://live.qq.com/> |✓| | |
| 阡陌视频 | <http://qianmo.com/> |✓| | |
| THVideo | <http://thvideo.tv/> |✓| | |
| Sina<br/>新浪视频<br/>微博秒拍视频 | <http://video.sina.com.cn/><br/><http://video.weibo.com/> |✓| | |
| Sohu<br/>搜狐视频 | <http://tv.sohu.com/> |✓| | |
| 天天动听 | <http://www.dongting.com/> | | |✓|
| **Tudou<br/>土豆** | <http://www.tudou.com/> |✓| | |
| 虾米 | <http://www.xiami.com/> | | |✓|
| 虾米 | <http://www.xiami.com/> || |✓|
| 阳光卫视 | <http://www.isuntv.com/> |✓| | |
| **音悦Tai** | <http://www.yinyuetai.com/> |✓| | |
| **Youku<br/>优酷** | <http://www.youku.com/> |✓| | |
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
| 央视网 | <http://www.cntv.cn/> |✓| | |
| 花瓣 | <http://huaban.com/> | |✓| |
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
| 芒果TV | <http://www.mgtv.com/> |✓| | |
| 火猫TV | <http://www.huomao.com/> |✓| | |
| 阳光宽频网 | <http://www.365yg.com/> |✓| | |
| 西瓜视频 | <https://www.ixigua.com/> |✓| | |
| 新片场 | <https://www.xinpianchang.com/> |✓| | |
| 快手 | <https://www.kuaishou.com/> |✓|✓| |
| 抖音 | <https://www.douyin.com/> |✓| | |
| TikTok | <https://www.tiktok.com/> |✓| | |
| 中国体育(TV) | <http://v.zhibo.tv/> </br><http://video.zhibo.tv/> |✓| | |
| 知乎 | <https://www.zhihu.com/> |✓| | |
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
@ -414,19 +449,13 @@ For all other sites not on the list, the universal extractor will take care of f
If something is broken and `you-get` can't get you things you want, don't panic. (Yes, this happens all the time!)
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>, and search on the [list of open issues](https://github.com/soimort/you-get/issues). If it has not been reported yet, open a new issue, with detailed command-line output attached.
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report an issue](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md).
## Getting Involved
You can reach us on the Gitter channel [#soimort/you-get](https://gitter.im/soimort/you-get) (here's how you [set up your IRC client](http://irc.gitter.im) for Gitter). If you have a quick question regarding `you-get`, ask it there.
All kinds of pull requests are welcome. However, there are a few guidelines to follow:
* The [`develop`](https://github.com/soimort/you-get/tree/develop) branch is where your pull request should go.
* Remember to rebase.
* Document your PR clearly, and if applicable, provide some sample links for reviewers to test with.
* Write well-formatted, easy-to-understand commit messages. If you don't know how, look at existing ones.
* We will not ask you to sign a CLA, but you must assure that your code can be legally redistributed (under the terms of the MIT license).
If you are seeking to report an issue or contribute, please make sure to read [the guidelines](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) first.
## Legal Issues
@ -450,6 +479,6 @@ We only ship the code here, and how you are going to use it is left to your own
## Authors
Made by [@soimort](https://github.com/soimort), who is in turn powered by :coffee:, :pizza: and :ramen:.
Made by [@soimort](https://github.com/soimort), who is in turn powered by :coffee:, :beer: and :ramen:.
You can find the [list of all contributors](https://github.com/soimort/you-get/graphs/contributors) here.

View File

@ -41,5 +41,9 @@ setup(
classifiers = proj_info['classifiers'],
entry_points = {'console_scripts': proj_info['console_scripts']}
entry_points = {'console_scripts': proj_info['console_scripts']},
extras_require={
'socks': ['PySocks'],
}
)

View File

@ -1,7 +1,9 @@
#!/usr/bin/env python
''' WIP
def main():
script_main('you-get', any_download, any_download_playlist)
if __name__ == "__main__":
main()
'''

File diff suppressed because it is too large Load Diff

View File

@ -1,10 +1,11 @@
#!/usr/bin/env python
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run, player
from .common import print_more_compatible as print
from .util import log
from . import json_output
import os
import sys
class Extractor():
def __init__(self, *args):
@ -22,12 +23,18 @@ class VideoExtractor():
self.url = None
self.title = None
self.vid = None
self.m3u8_url = None
self.streams = {}
self.streams_sorted = []
self.audiolang = None
self.password_protected = False
self.dash_streams = {}
self.caption_tracks = {}
self.out = False
self.ua = None
self.referer = None
self.danmaku = None
self.lyrics = None
if args:
self.url = args[0]
@ -39,6 +46,8 @@ class VideoExtractor():
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
set_proxy(parse_host(kwargs['extractor_proxy']))
self.prepare(**kwargs)
if self.out:
return
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
unset_proxy()
@ -98,9 +107,13 @@ class VideoExtractor():
if 'quality' in stream:
print(" quality: %s" % stream['quality'])
if 'size' in stream:
if 'size' in stream and 'container' in stream and stream['container'].lower() != 'm3u8':
if stream['size'] != float('inf') and stream['size'] != 0:
print(" size: %s MiB (%s bytes)" % (round(stream['size'] / 1048576, 1), stream['size']))
if 'm3u8_url' in stream:
print(" m3u8_url: {}".format(stream['m3u8_url']))
if 'itag' in stream:
print(" # download-with: %s" % log.sprint("you-get --itag=%s [URL]" % stream_id, log.UNDERLINE))
else:
@ -119,6 +132,8 @@ class VideoExtractor():
print(" url: %s" % self.url)
print()
sys.stdout.flush()
def p(self, stream_id=None):
maybe_print("site: %s" % self.__class__.name)
maybe_print("title: %s" % self.title)
@ -143,6 +158,7 @@ class VideoExtractor():
for stream in itags:
self.p_stream(stream)
# Print all other available streams
if self.streams_sorted:
print(" [ DEFAULT ] %s" % ('_' * 33))
for stream in self.streams_sorted:
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
@ -153,6 +169,8 @@ class VideoExtractor():
print(" - lang: {}".format(i['lang']))
print(" download-url: {}\n".format(i['url']))
sys.stdout.flush()
def p_playlist(self, stream_id=None):
maybe_print("site: %s" % self.__class__.name)
print("playlist: %s" % self.title)
@ -183,6 +201,13 @@ class VideoExtractor():
stream_id = kwargs['stream_id']
else:
# Download stream with the best quality
from .processor.ffmpeg import has_ffmpeg_installed
if has_ffmpeg_installed() and player is None and self.dash_streams or not self.streams_sorted:
#stream_id = list(self.dash_streams)[-1]
itags = sorted(self.dash_streams,
key=lambda i: -self.dash_streams[i]['size'])
stream_id = itags[0]
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
if 'index' not in kwargs:
@ -199,16 +224,26 @@ class VideoExtractor():
ext = self.dash_streams[stream_id]['container']
total_size = self.dash_streams[stream_id]['size']
if ext == 'm3u8' or ext == 'm4a':
ext = 'mp4'
if not urls:
log.wtf('[Failed] Cannot extract video source.')
# For legacy main()
download_urls(urls, self.title, ext, total_size,
headers = {}
if self.ua is not None:
headers['User-Agent'] = self.ua
if self.referer is not None:
headers['Referer'] = self.referer
download_urls(urls, self.title, ext, total_size, headers=headers,
output_dir=kwargs['output_dir'],
merge=kwargs['merge'],
av=stream_id in self.dash_streams)
if not kwargs['caption']:
print('Skipping captions.')
if 'caption' not in kwargs or not kwargs['caption']:
print('Skipping captions or danmaku.')
return
for lang in self.caption_tracks:
filename = '%s.%s.srt' % (get_filename(self.title), lang)
print('Saving %s ... ' % filename, end="", flush=True)
@ -218,7 +253,20 @@ class VideoExtractor():
x.write(srt)
print('Done.')
if self.danmaku is not None and not dry_run:
filename = '{}.cmt.xml'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.danmaku)
if self.lyrics is not None and not dry_run:
filename = '{}.lrc'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.lyrics)
# For main_dev()
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])
keep_obj = kwargs.get('keep_obj', False)
if not keep_obj:
self.__init__()

View File

@ -11,9 +11,10 @@ from .bokecc import *
from .cbs import *
from .ckplayer import *
from .cntv import *
from .coub import *
from .dailymotion import *
from .dilidili import *
from .douban import *
from .douyin import *
from .douyutv import *
from .ehow import *
from .facebook import *
@ -23,7 +24,7 @@ from .freesound import *
from .funshion import *
from .google import *
from .heavymusic import *
from .huaban import *
from .icourses import *
from .ifeng import *
from .imgur import *
from .infoq import *
@ -32,12 +33,15 @@ from .interest import *
from .iqilu import *
from .iqiyi import *
from .joy import *
from .jpopsuki import *
from .khan import *
from .ku6 import *
from .kakao import *
from .kuaishou import *
from .kugou import *
from .kuwo import *
from .le import *
from .lizhi import *
from .longzhu import *
from .magisto import *
from .metacafe import *
from .mgtv import *
@ -45,41 +49,41 @@ from .miaopai import *
from .miomio import *
from .mixcloud import *
from .mtv81 import *
from .musicplayon import *
from .nanagogo import *
from .naver import *
from .netease import *
from .nicovideo import *
from .panda import *
from .pinterest import *
from .pixnet import *
from .pptv import *
from .qianmo import *
from .qie import *
from .qingting import *
from .qq import *
from .showroom import *
from .sina import *
from .sohu import *
from .soundcloud import *
from .suntv import *
from .ted import *
from .theplatform import *
from .thvideo import *
from .tiktok import *
from .tucao import *
from .tudou import *
from .tumblr import *
from .twitter import *
from .ucas import *
from .veoh import *
from .videomega import *
from .vimeo import *
from .vine import *
from .vk import *
from .w56 import *
from .wanmen import *
from .xiami import *
from .xinpianchang import *
from .yinyuetai import *
from .yixia import *
from .youku import *
from .youtube import *
from .ted import *
from .khan import *
from .zhanqi import *
from .zhibo import *
from .zhihu import *

View File

@ -1,92 +1,213 @@
#!/usr/bin/env python
__all__ = ['acfun_download']
from ..common import *
from ..extractor import VideoExtractor
from .le import letvcloud_download_by_vu
from .qq import qq_download_by_vid
from .sina import sina_download_by_vid
from .tudou import tudou_download_by_iid
from .youku import youku_download_by_vid, youku_open_download_by_vid
class AcFun(VideoExtractor):
name = "AcFun"
import json, re
stream_types = [
{'id': '2160P', 'qualityType': '2160p'},
{'id': '1080P60', 'qualityType': '1080p60'},
{'id': '720P60', 'qualityType': '720p60'},
{'id': '1080P+', 'qualityType': '1080p+'},
{'id': '1080P', 'qualityType': '1080p'},
{'id': '720P', 'qualityType': '720p'},
{'id': '540P', 'qualityType': '540p'},
{'id': '360P', 'qualityType': '360p'}
]
def get_srt_json(id):
url = 'http://danmu.aixifan.com/V2/%s' % id
return get_html(url)
def prepare(self, **kwargs):
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', self.url)
def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
"""str, str, str, bool, bool ->None
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', self.url):
html = get_content(self.url, headers=fake_headers)
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
json_data = json.loads(json_text)
vid = json_data.get('currentVideoInfo').get('id')
up = json_data.get('user').get('name')
self.title = json_data.get('title')
video_list = json_data.get('videoList')
if len(video_list) > 1:
self.title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
currentVideoInfo = json_data.get('currentVideoInfo')
Download Acfun video by vid.
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/aa(\d+)", self.url):
html = get_content(self.url, headers=fake_headers)
tag_script = match1(html, r'<script>\s*window\.pageInfo([^<]+)</script>')
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
json_data = json.loads(json_text)
self.title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
vid = str(json_data['videoId'])
up = "acfun"
currentVideoInfo = json_data.get('currentVideoInfo')
Call Acfun API, decide which site to use, and pass the job to its
extractor.
"""
#first call the main parasing API
info = json.loads(get_html('http://www.acfun.tv/video/getVideo.aspx?id=' + vid))
sourceType = info['sourceType']
#decide sourceId to know which extractor to use
if 'sourceId' in info: sourceId = info['sourceId']
# danmakuId = info['danmakuId']
#call extractor decided by sourceId
if sourceType == 'sina':
sina_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'youku':
youku_download_by_vid(sourceId, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
elif sourceType == 'tudou':
tudou_download_by_iid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'qq':
qq_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'letv':
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
elif sourceType == 'zhuzhan':
#As in Jul.28.2016, Acfun is using embsig to anti hotlink so we need to pass this
embsig = info['encode']
a = 'http://api.aixifan.com/plays/%s' % vid
s = json.loads(get_content(a, headers={'deviceType': '2'}))
if s['data']['source'] == "zhuzhan-youku":
sourceId = s['data']['sourceId']
youku_open_download_by_vid(client_id='908a519d032263f8', vid=sourceId, title=title, output_dir=output_dir,merge=merge, info_only=info_only, embsig = embsig, **kwargs)
else:
raise NotImplementedError(sourceType)
raise NotImplemented
if not info_only and not dry_run:
if not kwargs['caption']:
print('Skipping danmaku.')
if 'ksPlayJson' in currentVideoInfo:
durationMillis = currentVideoInfo['durationMillis']
ksPlayJson = ksPlayJson = json.loads( currentVideoInfo['ksPlayJson'] )
representation = ksPlayJson.get('adaptationSet')[0].get('representation')
stream_list = representation
for stream in stream_list:
m3u8_url = stream["url"]
size = durationMillis * stream["avgBitrate"] / 8
# size = float('inf')
container = 'mp4'
stream_id = stream["qualityLabel"]
quality = stream["qualityType"]
stream_data = dict(src=m3u8_url, size=size, container=container, quality=quality)
self.streams[stream_id] = stream_data
assert self.title and m3u8_url
self.title = unescape_html(self.title)
self.title = escape_file_path(self.title)
p_title = r1('active">([^<]+)', html)
self.title = '%s (%s)' % (self.title, up)
if p_title:
self.title = '%s - %s' % (self.title, p_title)
def download(self, **kwargs):
if 'json_output' in kwargs and kwargs['json_output']:
json_output.output(self)
elif 'info_only' in kwargs and kwargs['info_only']:
if 'stream_id' in kwargs and kwargs['stream_id']:
# Display the stream
stream_id = kwargs['stream_id']
if 'index' not in kwargs:
self.p(stream_id)
else:
self.p_i(stream_id)
else:
# Display all available streams
if 'index' not in kwargs:
self.p([])
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
self.p_i(stream_id)
else:
if 'stream_id' in kwargs and kwargs['stream_id']:
# Download the stream
stream_id = kwargs['stream_id']
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
if 'index' not in kwargs:
self.p(stream_id)
else:
self.p_i(stream_id)
if stream_id in self.streams:
url = self.streams[stream_id]['src']
ext = self.streams[stream_id]['container']
total_size = self.streams[stream_id]['size']
if ext == 'm3u8' or ext == 'm4a':
ext = 'mp4'
if not url:
log.wtf('[Failed] Cannot extract video source.')
# For legacy main()
headers = {}
if self.ua is not None:
headers['User-Agent'] = self.ua
if self.referer is not None:
headers['Referer'] = self.referer
download_url_ffmpeg(url, self.title, ext, output_dir=kwargs['output_dir'], merge=kwargs['merge'])
if 'caption' not in kwargs or not kwargs['caption']:
print('Skipping captions or danmaku.')
return
try:
title = get_filename(title)
print('Downloading %s ...\n' % (title + '.cmt.json'))
cmt = get_srt_json(vid)
with open(os.path.join(output_dir, title + '.cmt.json'), 'w', encoding='utf-8') as x:
x.write(cmt)
except:
pass
def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
assert re.match(r'http://[^\.]+.acfun.[^\.]+/\D/\D\D(\d+)', url)
html = get_html(url)
for lang in self.caption_tracks:
filename = '%s.%s.srt' % (get_filename(self.title), lang)
print('Saving %s ... ' % filename, end="", flush=True)
srt = self.caption_tracks[lang]
with open(os.path.join(kwargs['output_dir'], filename),
'w', encoding='utf-8') as x:
x.write(srt)
print('Done.')
title = r1(r'data-title="([^"]+)"', html)
if self.danmaku is not None and not dry_run:
filename = '{}.cmt.xml'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.danmaku)
if self.lyrics is not None and not dry_run:
filename = '{}.lrc'.format(get_filename(self.title))
print('Downloading {} ...\n'.format(filename))
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
fp.write(self.lyrics)
# For main_dev()
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])
keep_obj = kwargs.get('keep_obj', False)
if not keep_obj:
self.__init__()
def acfun_download(self, url, output_dir='.', merge=True, info_only=False, **kwargs):
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', url)
def getM3u8UrlFromCurrentVideoInfo(currentVideoInfo):
if 'playInfos' in currentVideoInfo:
return currentVideoInfo['playInfos'][0]['playUrls'][0]
elif 'ksPlayJson' in currentVideoInfo:
ksPlayJson = json.loads( currentVideoInfo['ksPlayJson'] )
representation = ksPlayJson.get('adaptationSet')[0].get('representation')
reps = []
for one in representation:
reps.append( (one['width']* one['height'], one['url'], one['backupUrl']) )
return max(reps)[1]
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url):
html = get_content(url, headers=fake_headers)
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
json_data = json.loads(json_text)
vid = json_data.get('currentVideoInfo').get('id')
up = json_data.get('user').get('name')
title = json_data.get('title')
video_list = json_data.get('videoList')
if len(video_list) > 1:
title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
currentVideoInfo = json_data.get('currentVideoInfo')
m3u8_url = getM3u8UrlFromCurrentVideoInfo(currentVideoInfo)
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/aa(\d+)", url):
html = get_content(url, headers=fake_headers)
tag_script = match1(html, r'<script>\s*window\.pageInfo([^<]+)</script>')
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
json_data = json.loads(json_text)
title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
vid = str(json_data['videoId'])
up = "acfun"
currentVideoInfo = json_data.get('currentVideoInfo')
m3u8_url = getM3u8UrlFromCurrentVideoInfo(currentVideoInfo)
else:
raise NotImplemented
assert title and m3u8_url
title = unescape_html(title)
title = escape_file_path(title)
assert title
p_title = r1('active">([^<]+)', html)
title = '%s (%s)' % (title, up)
if p_title:
title = '%s - %s' % (title, p_title)
vid = r1('data-vid="(\d+)"', html)
up = r1('data-name="([^"]+)"', html)
title = title + ' - ' + up
acfun_download_by_vid(vid, title,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
print_info(site_info, title, 'm3u8', float('inf'))
if not info_only:
download_url_ffmpeg(m3u8_url, title, 'mp4', output_dir=output_dir, merge=merge)
site_info = "AcFun.tv"
download = acfun_download
site = AcFun()
site_info = "AcFun.cn"
download = site.download_by_url
download_playlist = playlist_not_supported('acfun')

View File

@ -38,7 +38,7 @@ def baidu_get_song_title(data):
def baidu_get_song_lyric(data):
lrc = data['lrcLink']
return None if lrc is '' else "http://music.baidu.com%s" % lrc
return "http://music.baidu.com%s" % lrc if lrc else None
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
@ -104,42 +104,54 @@ def baidu_download_album(aid, output_dir='.', merge=True, info_only=False):
def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=False, **kwargs):
if re.match(r'http://pan.baidu.com', url):
if re.match(r'https?://pan.baidu.com', url):
real_url, title, ext, size = baidu_pan_download(url)
print_info('BaiduPan', title, ext, size)
if not info_only:
print('Hold on...')
time.sleep(5)
download_urls([real_url], title, ext, size,
output_dir, url, merge=merge, faker=True)
elif re.match(r'http://music.baidu.com/album/\d+', url):
id = r1(r'http://music.baidu.com/album/(\d+)', url)
elif re.match(r'https?://music.baidu.com/album/\d+', url):
id = r1(r'https?://music.baidu.com/album/(\d+)', url)
baidu_download_album(id, output_dir, merge, info_only)
elif re.match('http://music.baidu.com/song/\d+', url):
id = r1(r'http://music.baidu.com/song/(\d+)', url)
elif re.match('https?://music.baidu.com/song/\d+', url):
id = r1(r'https?://music.baidu.com/song/(\d+)', url)
baidu_download_song(id, output_dir, merge, info_only)
elif re.match('http://tieba.baidu.com/', url):
elif re.match('https?://tieba.baidu.com/', url):
try:
# embedded videos
embed_download(url, output_dir, merge=merge, info_only=info_only)
embed_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
except:
# images
html = get_html(url)
title = r1(r'title:"([^"]+)"', html)
vhsrc = re.findall(r'"BDE_Image"[^>]+src="([^"]+\.mp4)"', html) or \
re.findall(r'vhsrc="([^"]+)"', html)
if len(vhsrc) > 0:
ext = 'mp4'
size = url_size(vhsrc[0])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(vhsrc, title, ext, size,
output_dir=output_dir, merge=False)
items = re.findall(
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
r'//tiebapic.baidu.com/forum/w[^"]+/([^/"]+)', html)
urls = ['http://tiebapic.baidu.com/forum/pic/item/' + i
for i in set(items)]
# handle albums
kw = r1(r'kw=([^&]+)', html) or r1(r"kw:'([^']+)'", html)
tid = r1(r'tid=(\d+)', html) or r1(r"tid:'([^']+)'", html)
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s' % (
kw, tid)
album_url = 'http://tieba.baidu.com/photo/g/bw/picture/list?kw=%s&tid=%s&pe=%s' % (kw, tid, 1000)
album_info = json.loads(get_content(album_url))
for i in album_info['data']['pic_list']:
urls.append(
'http://imgsrc.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
'http://tiebapic.baidu.com/forum/pic/item/' + i['pic_id'] + '.jpg')
ext = 'jpg'
size = float('Inf')
@ -210,9 +222,6 @@ def baidu_pan_download(url):
title_wrapped = json.loads('{"wrapper":"%s"}' % title)
title = title_wrapped['wrapper']
logging.debug(real_url)
print_info(site_info, title, ext, size)
print('Hold on...')
time.sleep(5)
return real_url, title, ext, size

View File

@ -6,6 +6,16 @@ from ..common import *
import urllib
def baomihua_headers(referer=None, cookie=None):
# a reasonable UA
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
headers = {'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'User-Agent': ua}
if referer is not None:
headers.update({'Referer': referer})
if cookie is not None:
headers.update({'Cookie': cookie})
return headers
def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html('http://play.baomihua.com/getvideourl.aspx?flvid=%s&devicetype=phone_app' % id)
host = r1(r'host=([^&]*)', html)
@ -14,11 +24,12 @@ def baomihua_download_by_id(id, title=None, output_dir='.', merge=True, info_onl
assert type
vid = r1(r'&stream_name=([^&]*)', html)
assert vid
url = "http://%s/pomoho_video/%s.%s" % (host, vid, type)
_, ext, size = url_info(url)
dir_str = r1(r'&dir=([^&]*)', html).strip()
url = "http://%s/%s/%s.%s" % (host, dir_str, vid, type)
_, ext, size = url_info(url, headers=baomihua_headers())
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge = merge)
download_urls([url], title, ext, size, output_dir, merge = merge, headers=baomihua_headers())
def baomihua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)

View File

@ -1,196 +1,770 @@
#!/usr/bin/env python
__all__ = ['bilibili_download']
from ..common import *
from .sina import sina_download_by_vid
from .tudou import tudou_download_by_id
from .youku import youku_download_by_vid
from ..extractor import VideoExtractor
import hashlib
import re
appkey = 'f3bb208b3d081dc8'
SECRETKEY_MINILOADER = '1c15888dc316e05a15fdd0a02ed6584f'
class Bilibili(VideoExtractor):
name = "Bilibili"
def get_srt_xml(id):
url = 'http://comment.bilibili.com/%s.xml' % id
return get_html(url)
# Bilibili media encoding options, in descending quality order.
stream_types = [
{'id': 'hdflv2_4k', 'quality': 120, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '2160p', 'desc': '超清 4K'},
{'id': 'flv_p60', 'quality': 116, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P60'},
{'id': 'hdflv2', 'quality': 112, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P+'},
{'id': 'flv', 'quality': 80, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P'},
{'id': 'flv720_p60', 'quality': 74, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P60'},
{'id': 'flv720', 'quality': 64, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P'},
{'id': 'hdmp4', 'quality': 48, 'audio_quality': 30280,
'container': 'MP4', 'video_resolution': '720p', 'desc': '高清 720P (MP4)'},
{'id': 'flv480', 'quality': 32, 'audio_quality': 30280,
'container': 'FLV', 'video_resolution': '480p', 'desc': '清晰 480P'},
{'id': 'flv360', 'quality': 16, 'audio_quality': 30216,
'container': 'FLV', 'video_resolution': '360p', 'desc': '流畅 360P'},
# 'quality': 15?
{'id': 'mp4', 'quality': 0},
{'id': 'jpg', 'quality': 0},
]
def parse_srt_p(p):
fields = p.split(',')
assert len(fields) == 8, fields
time, mode, font_size, font_color, pub_time, pool, user_id, history = fields
time = float(time)
@staticmethod
def height_to_quality(height, qn):
if height <= 360 and qn <= 16:
return 16
elif height <= 480 and qn <= 32:
return 32
elif height <= 720 and qn <= 64:
return 64
elif height <= 1080 and qn <= 80:
return 80
elif height <= 1080 and qn <= 112:
return 112
else:
return 120
mode = int(mode)
assert 1 <= mode <= 8
# mode 1~3: scrolling
# mode 4: bottom
# mode 5: top
# mode 6: reverse?
# mode 7: position
# mode 8: advanced
@staticmethod
def bilibili_headers(referer=None, cookie=None):
# a reasonable UA
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
headers = {'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'User-Agent': ua}
if referer is not None:
headers.update({'Referer': referer})
if cookie is not None:
headers.update({'Cookie': cookie})
return headers
pool = int(pool)
assert 0 <= pool <= 2
# pool 0: normal
# pool 1: srt
# pool 2: special?
@staticmethod
def bilibili_api(avid, cid, qn=0):
return 'https://api.bilibili.com/x/player/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&fnver=0&fnval=16&fourk=1' % (avid, cid, qn)
font_size = int(font_size)
@staticmethod
def bilibili_audio_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/url?sid=%s' % sid
font_color = '#%06x' % int(font_color)
@staticmethod
def bilibili_audio_info_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/song/info?sid=%s' % sid
return pool, mode, font_size, font_color
@staticmethod
def bilibili_audio_menu_info_api(sid):
return 'https://www.bilibili.com/audio/music-service-c/web/menu/info?sid=%s' % sid
@staticmethod
def bilibili_audio_menu_song_api(sid, ps=100):
return 'https://www.bilibili.com/audio/music-service-c/web/song/of-menu?sid=%s&pn=1&ps=%s' % (sid, ps)
def parse_srt_xml(xml):
d = re.findall(r'<d p="([^"]+)">(.*)</d>', xml)
for x, y in d:
p = parse_srt_p(x)
raise NotImplementedError()
@staticmethod
def bilibili_bangumi_api(avid, cid, ep_id, qn=0, fnval=16):
return 'https://api.bilibili.com/pgc/player/web/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&ep_id=%s&fnver=0&fnval=%s' % (avid, cid, qn, ep_id, fnval)
@staticmethod
def bilibili_interface_api(cid, qn=0):
entropy = 'rbMCKn@KuamXWlPMoJGsKcbiJKUfkPF_8dABscJntvqhRSETg'
appkey, sec = ''.join([chr(ord(i) + 2) for i in entropy[::-1]]).split(':')
params = 'appkey=%s&cid=%s&otype=json&qn=%s&quality=%s&type=' % (appkey, cid, qn, qn)
chksum = hashlib.md5(bytes(params + sec, 'utf8')).hexdigest()
return 'https://interface.bilibili.com/v2/playurl?%s&sign=%s' % (params, chksum)
def parse_cid_playurl(xml):
from xml.dom.minidom import parseString
@staticmethod
def bilibili_live_api(cid):
return 'https://api.live.bilibili.com/room/v1/Room/playUrl?cid=%s&quality=0&platform=web' % cid
@staticmethod
def bilibili_live_room_info_api(room_id):
return 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id=%s' % room_id
@staticmethod
def bilibili_live_room_init_api(room_id):
return 'https://api.live.bilibili.com/room/v1/Room/room_init?id=%s' % room_id
@staticmethod
def bilibili_space_channel_api(mid, cid, pn=1, ps=100):
return 'https://api.bilibili.com/x/space/channel/video?mid=%s&cid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (mid, cid, pn, ps)
@staticmethod
def bilibili_space_favlist_api(fid, pn=1, ps=20):
return 'https://api.bilibili.com/x/v3/fav/resource/list?media_id=%s&pn=%s&ps=%s&order=mtime&type=0&tid=0&jsonp=jsonp' % (fid, pn, ps)
@staticmethod
def bilibili_space_video_api(mid, pn=1, ps=100):
return "https://api.bilibili.com/x/space/arc/search?mid=%s&pn=%s&ps=%s&tid=0&keyword=&order=pubdate&jsonp=jsonp" % (mid, pn, ps)
@staticmethod
def bilibili_vc_api(video_id):
return 'https://api.vc.bilibili.com/clip/v1/video/detail?video_id=%s' % video_id
@staticmethod
def bilibili_h_api(doc_id):
return 'https://api.vc.bilibili.com/link_draw/v1/doc/detail?doc_id=%s' % doc_id
@staticmethod
def url_size(url, faker=False, headers={},err_value=0):
try:
doc = parseString(xml.encode('utf-8'))
urls = [durl.getElementsByTagName('url')[0].firstChild.nodeValue for durl in doc.getElementsByTagName('durl')]
return urls
return url_size(url,faker,headers)
except:
return []
return err_value
def prepare(self, **kwargs):
self.stream_qualities = {s['quality']: s for s in self.stream_types}
def bilibili_download_by_cids(cids, title, output_dir='.', merge=True, info_only=False):
urls = []
for cid in cids:
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
urls += [i
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
for i in parse_cid_playurl(get_content(url))]
try:
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
except:
html_content = '' # live always returns 400 (why?)
#self.title = match1(html_content,
# r'<h1 title="([^"]+)"')
type_ = ''
size = 0
for url in urls:
_, type_, temp = url_info(url)
size += temp
# redirect: watchlater
if re.match(r'https?://(www\.)?bilibili\.com/watchlater/#/(av(\d+)|BV(\S+)/?)', self.url):
avid = match1(self.url, r'/(av\d+)') or match1(self.url, r'/(BV\w+)')
p = int(match1(self.url, r'/p(\d+)') or '1')
self.url = 'https://www.bilibili.com/video/%s?p=%s' % (avid, p)
html_content = get_content(self.url, headers=self.bilibili_headers())
print_info(site_info, title, type_, size)
if not info_only:
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
sign_this = hashlib.md5(bytes('cid={cid}&from=miniplay&player=1{SECRETKEY_MINILOADER}'.format(cid = cid, SECRETKEY_MINILOADER = SECRETKEY_MINILOADER), 'utf-8')).hexdigest()
url = 'http://interface.bilibili.com/playurl?&cid=' + cid + '&from=miniplay&player=1' + '&sign=' + sign_this
urls = [i
if not re.match(r'.*\.qqvideo\.tc\.qq\.com', i)
else re.sub(r'.*\.qqvideo\.tc\.qq\.com', 'http://vsrc.store.qq.com', i)
for i in parse_cid_playurl(get_content(url))]
type_ = ''
size = 0
for url in urls:
_, type_, temp = url_info(url)
size += temp or 0
print_info(site_info, title, type_, size)
if not info_only:
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_live_download_by_cid(cid, title, output_dir='.', merge=True, info_only=False):
api_url = 'http://live.bilibili.com/api/playurl?cid=' + cid
urls = parse_cid_playurl(get_content(api_url))
for url in urls:
_, type_, _ = url_info(url)
size = 0
print_info(site_info, title, type_, size)
if not info_only:
download_urls([url], title, type_, total_size=None, output_dir=output_dir, merge=merge)
def bilibili_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_content(url)
if re.match(r'https?://bangumi\.bilibili\.com/', url):
# quick hack for bangumi URLs
url = r1(r'"([^"]+)" class="v-av-link"', html)
html = get_content(url)
title = r1_of([r'<meta name="title" content="\s*([^<>]{1,999})\s*" />',
r'<h1[^>]*>\s*([^<>]+)\s*</h1>'], html)
if title:
title = unescape_html(title)
title = escape_file_path(title)
flashvars = r1_of([r'(cid=\d+)', r'(cid: \d+)', r'flashvars="([^"]+)"',
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
assert flashvars
flashvars = flashvars.replace(': ', '=')
t, cid = flashvars.split('=', 1)
cid = cid.split('&')[0]
if t == 'cid':
if re.match(r'https?://live\.bilibili\.com/', url):
title = r1(r'<title>\s*([^<>]+)\s*</title>', html)
bilibili_live_download_by_cid(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
# redirect: bangumi/play/ss -> bangumi/play/ep
# redirect: bangumi.bilibili.com/anime -> bangumi/play/ep
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ss(\d+)', self.url) or \
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)/play', self.url):
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
ep_id = initial_state['epList'][0]['id']
self.url = 'https://www.bilibili.com/bangumi/play/ep%s' % ep_id
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
# sort it out
if re.match(r'https?://(www\.)?bilibili\.com/audio/au(\d+)', self.url):
sort = 'audio'
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
sort = 'bangumi'
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
sort = 'bangumi'
elif re.match(r'https?://live\.bilibili\.com/', self.url):
sort = 'live'
elif re.match(r'https?://vc\.bilibili\.com/video/(\d+)', self.url):
sort = 'vc'
elif re.match(r'https?://(www\.)?bilibili\.com/video/(av(\d+)|(BV(\S+)))', self.url):
sort = 'video'
elif re.match(r'https?://h\.?bilibili\.com/(\d+)', self.url):
sort = 'h'
else:
# multi-P
cids = []
pages = re.findall('<option value=\'([^\']*)\'', html)
titles = re.findall('<option value=.*>\s*([^<>]+)\s*</option>', html)
for i, page in enumerate(pages):
html = get_html("http://www.bilibili.com%s" % page)
flashvars = r1_of([r'(cid=\d+)',
r'flashvars="([^"]+)"',
r'"https://[a-z]+\.bilibili\.com/secure,(cid=\d+)(?:&aid=\d+)?"'], html)
if flashvars:
t, cid = flashvars.split('=', 1)
cids.append(cid.split('&')[0])
if url.endswith(page):
cids = [cid.split('&')[0]]
titles = [titles[i]]
break
# no multi-P
if not pages:
cids = [cid]
titles = [r1(r'<option value=.* selected>\s*([^<>]+)\s*</option>', html) or title]
for i in range(len(cids)):
bilibili_download_by_cid(cids[i],
titles[i],
output_dir=output_dir,
merge=merge,
info_only=info_only)
elif t == 'vid':
sina_download_by_vid(cid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
elif t == 'ykid':
youku_download_by_vid(cid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
elif t == 'uid':
tudou_download_by_id(cid, title, output_dir=output_dir, merge=merge, info_only=info_only)
else:
raise NotImplementedError(flashvars)
if not info_only and not dry_run:
if not kwargs['caption']:
print('Skipping danmaku.')
self.download_playlist_by_url(self.url, **kwargs)
return
title = get_filename(title)
print('Downloading %s ...\n' % (title + '.cmt.xml'))
xml = get_srt_xml(cid)
with open(os.path.join(output_dir, title + '.cmt.xml'), 'w', encoding='utf-8') as x:
x.write(xml)
# regular av video
if sort == 'video':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo = json.loads(playinfo_text) if playinfo_text else None
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
# warn if it is a multi-part video
pn = initial_state['videoData']['videos']
if pn > 1 and not kwargs.get('playlist'):
log.w('This is a multipart video. (use --playlist to download all parts.)')
# set video title
self.title = initial_state['videoData']['title']
# refine title for a specific part, if it is a multi-part video
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or
'1') # use URL to decide p-number, not initial_state['p']
if pn > 1:
part = initial_state['videoData']['pages'][p - 1]['part']
self.title = '%s (P%s. %s)' % (self.title, p, part)
# construct playinfos
avid = initial_state['aid']
cid = initial_state['videoData']['pages'][p - 1]['cid'] # use p-number, not initial_state['videoData']['cid']
current_quality, best_quality = None, None
if playinfo is not None:
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
best_quality = playinfo['data']['accept_quality'][0]
playinfos = []
if playinfo is not None:
playinfos.append(playinfo)
if playinfo_ is not None:
playinfos.append(playinfo_)
# get alternative formats from API
for qn in [120, 112, 80, 64, 32, 16]:
# automatic format for durl: qn=0
# for dash, qn does not matter
if current_quality is None or qn < current_quality:
api_url = self.bilibili_api(avid, cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
else:
message = api_playinfo['data']['message']
if best_quality is None or qn <= best_quality:
api_url = self.bilibili_interface_api(cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
api_playinfo_data = json.loads(api_content)
if api_playinfo_data.get('quality'):
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
if not playinfos:
log.w(message)
# use bilibili error video instead
url = 'https://static.hdslb.com/error.mp4'
_, container, size = url_info(url)
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
return
for playinfo in playinfos:
quality = playinfo['data']['quality']
format_id = self.stream_qualities[quality]['id']
container = self.stream_qualities[quality]['container'].lower()
desc = self.stream_qualities[quality]['desc']
if 'durl' in playinfo['data']:
src, size = [], 0
for durl in playinfo['data']['durl']:
src.append(durl['url'])
size += durl['size']
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
# DASH formats
if 'dash' in playinfo['data']:
audio_size_cache = {}
for video in playinfo['data']['dash']['video']:
# prefer the latter codecs!
s = self.stream_qualities[video['id']]
format_id = 'dash-' + s['id'] # prefix
container = 'mp4' # enforce MP4 container
desc = s['desc']
audio_quality = s['audio_quality']
baseurl = video['baseUrl']
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
# find matching audio track
if playinfo['data']['dash']['audio']:
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
for audio in playinfo['data']['dash']['audio']:
if int(audio['id']) == audio_quality:
audio_baseurl = audio['baseUrl']
break
if not audio_size_cache.get(audio_quality, False):
audio_size_cache[audio_quality] = self.url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
size += audio_size_cache[audio_quality]
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl], [audio_baseurl]], 'size': size}
else:
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl]], 'size': size}
# get danmaku
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
# bangumi
elif sort == 'bangumi':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
# warn if this bangumi has more than 1 video
epn = len(initial_state['epList'])
if epn > 1 and not kwargs.get('playlist'):
log.w('This bangumi currently has %s videos. (use --playlist to download all videos.)' % epn)
# set video title
self.title = initial_state['h1Title']
# construct playinfos
ep_id = initial_state['epInfo']['id']
avid = initial_state['epInfo']['aid']
cid = initial_state['epInfo']['cid']
playinfos = []
api_url = self.bilibili_bangumi_api(avid, cid, ep_id)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
else:
log.e(api_playinfo['message'])
return
current_quality = api_playinfo['result']['quality']
# get alternative formats from API
for fnval in [8, 16]:
for qn in [120, 112, 80, 64, 32, 16]:
# automatic format for durl: qn=0
# for dash, qn does not matter
if qn != current_quality:
api_url = self.bilibili_bangumi_api(avid, cid, ep_id, qn=qn, fnval=fnval)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
for playinfo in playinfos:
if 'durl' in playinfo['result']:
quality = playinfo['result']['quality']
format_id = self.stream_qualities[quality]['id']
container = self.stream_qualities[quality]['container'].lower()
desc = self.stream_qualities[quality]['desc']
src, size = [], 0
for durl in playinfo['result']['durl']:
src.append(durl['url'])
size += durl['size']
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
# DASH formats
if 'dash' in playinfo['result']:
for video in playinfo['result']['dash']['video']:
# playinfo['result']['quality'] does not reflect the correct quality of DASH stream
quality = self.height_to_quality(video['height'], video['id']) # convert height to quality code
s = self.stream_qualities[quality]
format_id = 'dash-' + s['id'] # prefix
container = 'mp4' # enforce MP4 container
desc = s['desc']
audio_quality = s['audio_quality']
baseurl = video['baseUrl']
size = url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
# find matching audio track
audio_baseurl = playinfo['result']['dash']['audio'][0]['baseUrl']
for audio in playinfo['result']['dash']['audio']:
if int(audio['id']) == audio_quality:
audio_baseurl = audio['baseUrl']
break
size += url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl], [audio_baseurl]], 'size': size}
# get danmaku
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
# vc video
elif sort == 'vc':
video_id = match1(self.url, r'https?://vc\.?bilibili\.com/video/(\d+)')
api_url = self.bilibili_vc_api(video_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
# set video title
self.title = '%s (%s)' % (api_playinfo['data']['user']['name'], api_playinfo['data']['item']['id'])
height = api_playinfo['data']['item']['height']
quality = self.height_to_quality(height) # convert height to quality code
s = self.stream_qualities[quality]
format_id = s['id']
container = 'mp4' # enforce MP4 container
desc = s['desc']
playurl = api_playinfo['data']['item']['video_playurl']
size = int(api_playinfo['data']['item']['video_size'])
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': [playurl]}
# live
elif sort == 'live':
m = re.match(r'https?://live\.bilibili\.com/(\w+)', self.url)
short_id = m.group(1)
api_url = self.bilibili_live_room_init_api(short_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
room_init_info = json.loads(api_content)
room_id = room_init_info['data']['room_id']
api_url = self.bilibili_live_room_info_api(room_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
room_info = json.loads(api_content)
# set video title
self.title = room_info['data']['title'] + '.' + str(int(time.time()))
api_url = self.bilibili_live_api(room_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
video_info = json.loads(api_content)
durls = video_info['data']['durl']
playurl = durls[0]['url']
container = 'flv' # enforce FLV container
self.streams['flv'] = {'container': container, 'quality': 'unknown',
'size': 0, 'src': [playurl]}
# audio
elif sort == 'audio':
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/au(\d+)', self.url)
sid = m.group(1)
api_url = self.bilibili_audio_info_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
song_info = json.loads(api_content)
# set audio title
self.title = song_info['data']['title']
# get lyrics
self.lyrics = get_content(song_info['data']['lyric'])
api_url = self.bilibili_audio_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
audio_info = json.loads(api_content)
playurl = audio_info['data']['cdns'][0]
size = audio_info['data']['size']
container = 'mp4' # enforce MP4 container
self.streams['mp4'] = {'container': container,
'size': size, 'src': [playurl]}
# h images
elif sort == 'h':
m = re.match(r'https?://h\.?bilibili\.com/(\d+)', self.url)
doc_id = m.group(1)
api_url = self.bilibili_h_api(doc_id)
api_content = get_content(api_url, headers=self.bilibili_headers())
h_info = json.loads(api_content)
urls = []
for pic in h_info['data']['item']['pictures']:
img_src = pic['img_src']
urls.append(img_src)
size = urls_size(urls)
self.title = doc_id
container = 'jpg' # enforce JPG container
self.streams[container] = {'container': container,
'size': size, 'src': urls}
def prepare_by_cid(self,avid,cid,title,html_content,playinfo,playinfo_,url):
#response for interaction video
#主要针对互动视频使用cid而不是url来相互区分
self.stream_qualities = {s['quality']: s for s in self.stream_types}
self.title = title
self.url = url
current_quality, best_quality = None, None
if playinfo is not None:
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
best_quality = playinfo['data']['accept_quality'][0]
playinfos = []
if playinfo is not None:
playinfos.append(playinfo)
if playinfo_ is not None:
playinfos.append(playinfo_)
# get alternative formats from API
for qn in [80, 64, 32, 16]:
# automatic format for durl: qn=0
# for dash, qn does not matter
if current_quality is None or qn < current_quality:
api_url = self.bilibili_api(avid, cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo = json.loads(api_content)
if api_playinfo['code'] == 0: # success
playinfos.append(api_playinfo)
else:
message = api_playinfo['data']['message']
if best_quality is None or qn <= best_quality:
api_url = self.bilibili_interface_api(cid, qn=qn)
api_content = get_content(api_url, headers=self.bilibili_headers())
api_playinfo_data = json.loads(api_content)
if api_playinfo_data.get('quality'):
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
if not playinfos:
log.w(message)
# use bilibili error video instead
url = 'https://static.hdslb.com/error.mp4'
_, container, size = url_info(url)
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
return
for playinfo in playinfos:
quality = playinfo['data']['quality']
format_id = self.stream_qualities[quality]['id']
container = self.stream_qualities[quality]['container'].lower()
desc = self.stream_qualities[quality]['desc']
if 'durl' in playinfo['data']:
src, size = [], 0
for durl in playinfo['data']['durl']:
src.append(durl['url'])
size += durl['size']
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
# DASH formats
if 'dash' in playinfo['data']:
audio_size_cache = {}
for video in playinfo['data']['dash']['video']:
# prefer the latter codecs!
s = self.stream_qualities[video['id']]
format_id = 'dash-' + s['id'] # prefix
container = 'mp4' # enforce MP4 container
desc = s['desc']
audio_quality = s['audio_quality']
baseurl = video['baseUrl']
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
# find matching audio track
if playinfo['data']['dash']['audio']:
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
for audio in playinfo['data']['dash']['audio']:
if int(audio['id']) == audio_quality:
audio_baseurl = audio['baseUrl']
break
if not audio_size_cache.get(audio_quality, False):
audio_size_cache[audio_quality] = self.url_size(audio_baseurl,
headers=self.bilibili_headers(referer=self.url))
size += audio_size_cache[audio_quality]
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl], [audio_baseurl]], 'size': size}
else:
self.dash_streams[format_id] = {'container': container, 'quality': desc,
'src': [[baseurl]], 'size': size}
# get danmaku
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
def extract(self, **kwargs):
# set UA and referer for downloading
headers = self.bilibili_headers(referer=self.url)
self.ua, self.referer = headers['User-Agent'], headers['Referer']
if not self.streams_sorted:
# no stream is available
return
if 'stream_id' in kwargs and kwargs['stream_id']:
# extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams and stream_id not in self.dash_streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
def download_playlist_by_url(self, url, **kwargs):
self.url = url
kwargs['playlist'] = True
html_content = get_content(self.url, headers=self.bilibili_headers(referer=self.url))
# sort it out
if re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
sort = 'bangumi'
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
sort = 'bangumi'
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/media/md(\d+)', self.url) or \
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)', self.url):
sort = 'bangumi_md'
elif re.match(r'https?://(www\.)?bilibili\.com/video/(av(\d+)|BV(\S+))', self.url):
sort = 'video'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url):
sort = 'space_channel'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url):
sort = 'space_favlist'
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url):
sort = 'space_video'
elif re.match(r'https?://(www\.)?bilibili\.com/audio/am(\d+)', self.url):
sort = 'audio_menu'
else:
log.e('[Error] Unsupported URL pattern.')
exit(1)
# regular av video
if sort == 'video':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
aid = initial_state['videoData']['aid']
pn = initial_state['videoData']['videos']
if pn!= len(initial_state['videoData']['pages']):#interaction video 互动视频
search_node_list = []
download_cid_set = set([initial_state['videoData']['cid']])
params = {
'id': 'cid:{}'.format(initial_state['videoData']['cid']),
'aid': str(aid)
}
urlcontent = get_content('https://api.bilibili.com/x/player.so?'+parse.urlencode(params), headers=self.bilibili_headers(referer='https://www.bilibili.com/video/av{}'.format(aid)))
graph_version = json.loads(urlcontent[urlcontent.find('<interaction>')+13:urlcontent.find('</interaction>')])['graph_version']
params = {
'aid': str(aid),
'graph_version': graph_version,
'platform': 'pc',
'portal': 0,
'screen': 0,
}
node_info = json.loads(get_content('https://api.bilibili.com/x/stein/nodeinfo?'+parse.urlencode(params)))
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo = json.loads(playinfo_text) if playinfo_text else None
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
self.prepare_by_cid(aid, initial_state['videoData']['cid'], initial_state['videoData']['title'] + ('P{}. {}'.format(1, node_info['data']['title'])),html_content,playinfo,playinfo_,url)
self.extract(**kwargs)
self.download(**kwargs)
for choice in node_info['data']['edges']['choices']:
search_node_list.append(choice['node_id'])
if not choice['cid'] in download_cid_set:
download_cid_set.add(choice['cid'])
self.prepare_by_cid(aid,choice['cid'],initial_state['videoData']['title']+('P{}. {}'.format(len(download_cid_set),choice['option'])),html_content,playinfo,playinfo_,url)
self.extract(**kwargs)
self.download(**kwargs)
while len(search_node_list)>0:
node_id = search_node_list.pop(0)
params.update({'node_id':node_id})
node_info = json.loads(get_content('https://api.bilibili.com/x/stein/nodeinfo?'+parse.urlencode(params)))
if node_info['data'].__contains__('edges'):
for choice in node_info['data']['edges']['choices']:
search_node_list.append(choice['node_id'])
if not choice['cid'] in download_cid_set:
download_cid_set.add(choice['cid'])
self.prepare_by_cid(aid,choice['cid'],initial_state['videoData']['title']+('P{}. {}'.format(len(download_cid_set),choice['option'])),html_content,playinfo,playinfo_,url)
try:
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
except:
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
self.extract(**kwargs)
self.download(**kwargs)
else:
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo = json.loads(playinfo_text) if playinfo_text else None
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or '1')-1
for pi in range(p,pn):
self.prepare_by_cid(aid,initial_state['videoData']['pages'][pi]['cid'],'%s (P%s. %s)' % (initial_state['videoData']['title'], pi+1, initial_state['videoData']['pages'][pi]['part']),html_content,playinfo,playinfo_,url)
try:
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
except:
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
self.extract(**kwargs)
self.download(**kwargs)
# purl = 'https://www.bilibili.com/video/av%s?p=%s' % (aid, pi+1)
# self.__class__().download_by_url(purl, **kwargs)
elif sort == 'bangumi':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
epn, i = len(initial_state['epList']), 0
for ep in initial_state['epList']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
ep_id = ep['id']
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
self.__class__().download_by_url(epurl, **kwargs)
elif sort == 'bangumi_md':
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
initial_state = json.loads(initial_state_text)
epn, i = len(initial_state['mediaInfo']['episodes']), 0
for ep in initial_state['mediaInfo']['episodes']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
ep_id = ep['ep_id']
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
self.__class__().download_by_url(epurl, **kwargs)
elif sort == 'space_channel':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url)
mid, cid = m.group(1), m.group(2)
api_url = self.bilibili_space_channel_api(mid, cid)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
channel_info = json.loads(api_content)
# TBD: channel of more than 100 videos
epn, i = len(channel_info['data']['list']['archives']), 0
for video in channel_info['data']['list']['archives']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['aid']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'space_favlist':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url)
vmid, fid = m.group(1), m.group(2)
api_url = self.bilibili_space_favlist_api(fid)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
favlist_info = json.loads(api_content)
pc = favlist_info['data']['info']['media_count'] // len(favlist_info['data']['medias'])
if favlist_info['data']['info']['media_count'] % len(favlist_info['data']['medias']) != 0:
pc += 1
for pn in range(1, pc + 1):
log.w('Extracting %s of %s pages ...' % (pn, pc))
api_url = self.bilibili_space_favlist_api(fid, pn=pn)
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
favlist_info = json.loads(api_content)
epn, i = len(favlist_info['data']['medias']), 0
for video in favlist_info['data']['medias']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['id']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'space_video':
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url)
mid = m.group(1)
api_url = self.bilibili_space_video_api(mid)
api_content = get_content(api_url, headers=self.bilibili_headers())
videos_info = json.loads(api_content)
pc = videos_info['data']['page']['count'] // videos_info['data']['page']['ps']
for pn in range(1, pc + 1):
api_url = self.bilibili_space_video_api(mid, pn=pn)
api_content = get_content(api_url, headers=self.bilibili_headers())
videos_info = json.loads(api_content)
epn, i = len(videos_info['data']['list']['vlist']), 0
for video in videos_info['data']['list']['vlist']:
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
url = 'https://www.bilibili.com/video/av%s' % video['aid']
self.__class__().download_playlist_by_url(url, **kwargs)
elif sort == 'audio_menu':
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/am(\d+)', self.url)
sid = m.group(1)
#api_url = self.bilibili_audio_menu_info_api(sid)
#api_content = get_content(api_url, headers=self.bilibili_headers())
#menu_info = json.loads(api_content)
api_url = self.bilibili_audio_menu_song_api(sid)
api_content = get_content(api_url, headers=self.bilibili_headers())
menusong_info = json.loads(api_content)
epn, i = len(menusong_info['data']['data']), 0
for song in menusong_info['data']['data']:
i += 1; log.w('Extracting %s of %s songs ...' % (i, epn))
url = 'https://www.bilibili.com/audio/au%s' % song['id']
self.__class__().download_by_url(url, **kwargs)
site_info = "bilibili.com"
download = bilibili_download
download_playlist = bilibili_download
site = Bilibili()
download = site.download_by_url
download_playlist = site.download_playlist_by_url
bilibili_download = download

View File

@ -52,10 +52,13 @@ class BokeCC(VideoExtractor):
raise
if title is None:
self.title = '_'.join([i.text for i in tree.iterfind('video/videomarks/videomark/markdesc')])
self.title = '_'.join([i.text for i in self.tree.iterfind('video/videomarks/videomark/markdesc')])
else:
self.title = title
if not title:
self.title = vid
for i in self.tree.iterfind('video/quality'):
quality = i.attrib ['value']
url = i[0].attrib['playurl']

View File

@ -6,10 +6,9 @@
__all__ = ['ckplayer_download']
from xml.etree import cElementTree as ET
from xml.etree import ElementTree as ET
from copy import copy
from ..common import *
#----------------------------------------------------------------------
def ckplayer_get_info_by_xml(ckinfo):
"""str->dict
@ -20,20 +19,22 @@ def ckplayer_get_info_by_xml(ckinfo):
'links': [],
'size': 0,
'flashvars': '',}
if '_text' in dictify(e)['ckplayer']['info'][0]['title'][0]: #title
video_dict['title'] = dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip()
dictified = dictify(e)['ckplayer']
if 'info' in dictified:
if '_text' in dictified['info'][0]['title'][0]: #title
video_dict['title'] = dictified['info'][0]['title'][0]['_text'].strip()
#if dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip(): #duration
#video_dict['title'] = dictify(e)['ckplayer']['info'][0]['title'][0]['_text'].strip()
if '_text' in dictify(e)['ckplayer']['video'][0]['size'][0]: #size exists for 1 piece
video_dict['size'] = sum([int(i['size'][0]['_text']) for i in dictify(e)['ckplayer']['video']])
if '_text' in dictified['video'][0]['size'][0]: #size exists for 1 piece
video_dict['size'] = sum([int(i['size'][0]['_text']) for i in dictified['video']])
if '_text' in dictify(e)['ckplayer']['video'][0]['file'][0]: #link exist
video_dict['links'] = [i['file'][0]['_text'].strip() for i in dictify(e)['ckplayer']['video']]
if '_text' in dictified['video'][0]['file'][0]: #link exist
video_dict['links'] = [i['file'][0]['_text'].strip() for i in dictified['video']]
if '_text' in dictify(e)['ckplayer']['flashvars'][0]:
video_dict['flashvars'] = dictify(e)['ckplayer']['flashvars'][0]['_text'].strip()
if '_text' in dictified['flashvars'][0]:
video_dict['flashvars'] = dictified['flashvars'][0]['_text'].strip()
return video_dict

View File

@ -1,49 +1,67 @@
#!/usr/bin/env python
__all__ = ['cntv_download', 'cntv_download_by_id']
from ..common import *
import json
import re
from ..common import get_content, r1, match1, playlist_not_supported
from ..extractor import VideoExtractor
def cntv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
assert id
info = json.loads(get_html('http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + id))
title = title or info['title']
video = info['video']
alternatives = [x for x in video.keys() if x.endswith('hapters')]
#assert alternatives in (['chapters'], ['lowChapters', 'chapters'], ['chapters', 'lowChapters']), alternatives
chapters = video['chapters'] if 'chapters' in video else video['lowChapters']
urls = [x['url'] for x in chapters]
ext = r1(r'\.([^.]+)$', urls[0])
assert ext in ('flv', 'mp4')
size = 0
for url in urls:
_, _, temp = url_info(url)
size += temp
__all__ = ['cntv_download', 'cntv_download_by_id']
print_info(site_info, title, ext, size)
if not info_only:
# avoid corrupted files - don't merge
download_urls(urls, title, ext, size, output_dir = output_dir, merge = False)
def cntv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
class CNTV(VideoExtractor):
name = 'CNTV.com'
stream_types = [
{'id': '1', 'video_profile': '1280x720_2000kb/s', 'map_to': 'chapters4'},
{'id': '2', 'video_profile': '1280x720_1200kb/s', 'map_to': 'chapters3'},
{'id': '3', 'video_profile': '640x360_850kb/s', 'map_to': 'chapters2'},
{'id': '4', 'video_profile': '480x270_450kb/s', 'map_to': 'chapters'},
{'id': '5', 'video_profile': '320x180_200kb/s', 'map_to': 'lowChapters'},
]
ep = 'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid={}'
def __init__(self):
super().__init__()
self.api_data = None
def prepare(self, **kwargs):
self.api_data = json.loads(get_content(self.__class__.ep.format(self.vid)))
self.title = self.api_data['title']
for s in self.api_data['video']:
for st in self.__class__.stream_types:
if st['map_to'] == s:
urls = self.api_data['video'][s]
src = [u['url'] for u in urls]
stream_data = dict(src=src, size=0, container='mp4', video_profile=st['video_profile'])
self.streams[st['id']] = stream_data
def cntv_download_by_id(rid, **kwargs):
CNTV().download_by_vid(rid, **kwargs)
def cntv_download(url, **kwargs):
if re.match(r'http://tv\.cntv\.cn/video/(\w+)/(\w+)', url):
id = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
rid = match1(url, r'http://tv\.cntv\.cn/video/\w+/(\w+)')
elif re.match(r'http(s)?://tv\.cctv\.com/\d+/\d+/\d+/\w+.shtml', url):
rid = r1(r'var guid = "(\w+)"', get_content(url))
elif re.match(r'http://\w+\.cntv\.cn/(\w+/\w+/(classpage/video/)?)?\d+/\d+\.shtml', url) or \
re.match(r'http://\w+.cntv.cn/(\w+/)*VIDE\d+.shtml', url) or \
re.match(r'http://(\w+).cntv.cn/(\w+)/classpage/video/(\d+)/(\d+).shtml', url) or \
re.match(r'http://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
re.match(r'http(s)?://\w+.cctv.com/\d+/\d+/\d+/\w+.shtml', url) or \
re.match(r'http://\w+.cntv.cn/\d+/\d+/\d+/\w+.shtml', url):
id = r1(r'videoCenterId","(\w+)"', get_html(url))
page = get_content(url)
rid = r1(r'videoCenterId","(\w+)"', page)
if rid is None:
guid = re.search(r'guid\s*=\s*"([0-9a-z]+)"', page).group(1)
rid = guid
elif re.match(r'http://xiyou.cntv.cn/v-[\w-]+\.html', url):
id = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
rid = r1(r'http://xiyou.cntv.cn/v-([\w-]+)\.html', url)
else:
raise NotImplementedError(url)
cntv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
CNTV().download_by_vid(rid, **kwargs)
site_info = "CNTV.com"
download = cntv_download

View File

@ -0,0 +1,105 @@
#!/usr/bin/env python
__all__ = ['coub_download']
from ..common import *
from ..processor import ffmpeg
from ..util.fs import legitimize
def coub_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_content(url)
try:
json_data = get_coub_data(html)
title, video_url, audio_url = get_title_and_urls(json_data)
video_file_name, video_file_path = get_file_path(merge, output_dir, title, video_url)
audio_file_name, audio_file_path = get_file_path(merge, output_dir, title, audio_url)
download_url(audio_url, merge, output_dir, title, info_only)
download_url(video_url, merge, output_dir, title, info_only)
if not info_only:
try:
fix_coub_video_file(video_file_path)
audio_duration = float(ffmpeg.ffprobe_get_media_duration(audio_file_path))
video_duration = float(ffmpeg.ffprobe_get_media_duration(video_file_path))
loop_file_path = get_loop_file_path(title, output_dir)
single_file_path = audio_file_path
if audio_duration > video_duration:
write_loop_file(round(audio_duration / video_duration), loop_file_path, video_file_name)
else:
single_file_path = audio_file_path
write_loop_file(round(video_duration / audio_duration), loop_file_path, audio_file_name)
ffmpeg.ffmpeg_concat_audio_and_video([loop_file_path, single_file_path], title + "_full", "mp4")
cleanup_files([video_file_path, audio_file_path, loop_file_path])
except EnvironmentError as err:
print("Error preparing full coub video. {}".format(err))
except Exception as err:
print("Error while downloading files. {}".format(err))
def write_loop_file(records_number, loop_file_path, file_name):
with open(loop_file_path, 'a') as file:
for i in range(records_number):
file.write("file '{}'\n".format(file_name))
def download_url(url, merge, output_dir, title, info_only):
mime, ext, size = url_info(url)
print_info(site_info, title, mime, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
def fix_coub_video_file(file_path):
with open(file_path, 'r+b') as file:
file.seek(0)
file.write(bytes(2))
def get_title_and_urls(json_data):
title = legitimize(re.sub('[\s*]', "_", json_data['title']))
video_info = json_data['file_versions']['html5']['video']
if 'high' not in video_info:
if 'med' not in video_info:
video_url = video_info['low']['url']
else:
video_url = video_info['med']['url']
else:
video_url = video_info['high']['url']
audio_info = json_data['file_versions']['html5']['audio']
if 'high' not in audio_info:
if 'med' not in audio_info:
audio_url = audio_info['low']['url']
else:
audio_url = audio_info['med']['url']
else:
audio_url = audio_info['high']['url']
return title, video_url, audio_url
def get_coub_data(html):
coub_data = r1(r'<script id=\'coubPageCoubJson\' type=\'text/json\'>([\w\W]+?(?=</script>))</script>', html)
json_data = json.loads(coub_data)
return json_data
def get_file_path(merge, output_dir, title, url):
mime, ext, size = url_info(url)
file_name = get_output_filename([], title, ext, output_dir, merge)
file_path = os.path.join(output_dir, file_name)
return file_name, file_path
def get_loop_file_path(title, output_dir):
return os.path.join(output_dir, get_output_filename([], title, "txt", None, False))
def cleanup_files(files):
for file in files:
os.remove(file)
site_info = "coub.com"
download = coub_download
download_playlist = playlist_not_supported('coub')

View File

@ -3,29 +3,36 @@
__all__ = ['dailymotion_download']
from ..common import *
import urllib.parse
def rebuilt_url(url):
path = urllib.parse.urlparse(url).path
aid = path.split('/')[-1].split('_')[0]
return 'http://www.dailymotion.com/embed/video/{}?autoplay=1'.format(aid)
def dailymotion_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
"""Downloads Dailymotion videos by URL.
"""
html = get_content(url)
html = get_content(rebuilt_url(url))
info = json.loads(match1(html, r'qualities":({.+?}),"'))
title = match1(html, r'"video_title"\s*:\s*"([^"]+)"') or \
match1(html, r'"title"\s*:\s*"([^"]+)"')
title = unicodize(title)
for quality in ['720','480','380','240','auto']:
for quality in ['1080','720','480','380','240','144','auto']:
try:
real_url = info[quality][0]["url"]
real_url = info[quality][1]["url"]
if real_url:
break
except KeyError:
pass
type, ext, size = url_info(real_url)
mime, ext, size = url_info(real_url)
print_info(site_info, title, type, size)
print_info(site_info, title, mime, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
download_urls([real_url], title, ext, size, output_dir=output_dir, merge=merge)
site_info = "Dailymotion.com"
download = dailymotion_download

View File

@ -1,77 +0,0 @@
#!/usr/bin/env python
__all__ = ['dilidili_download']
from ..common import *
from .ckplayer import ckplayer_download
headers = {
'DNT': '1',
'Accept-Encoding': 'gzip, deflate, sdch, br',
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Referer': 'http://www.dilidili.com/',
'Connection': 'keep-alive',
'Save-Data': 'on',
}
#----------------------------------------------------------------------
def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
"""->list"""
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
html = get_content(parse_url, headers=headers)
info = re.search(r'(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})', html).groups()
info = [i.strip('{}').split('->') for i in info]
info = {i[0]: i [1] for i in info}
stream_types = []
for i in zip(info['deft'].split('|'), info['defa'].split('|')):
stream_types.append({'id': str(i[1][-1]), 'container': 'mp4', 'video_profile': i[0]})
return stream_types
#----------------------------------------------------------------------
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
if re.match(r'http://www.dilidili.com/watch\S+', url):
html = get_content(url)
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
# player loaded via internal iframe
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
#print(frame_url)
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
match = re.search(r'(.+?)var video =(.+?);', html)
vid = match1(html, r'var vid="(.+)"')
hd2 = match1(html, r'var hd2="(.+)"')
typ = match1(html, r'var typ="(.+)"')
sign = match1(html, r'var sign="(.+)"')
tmsign = match1(html, r'tmsign=([A-Za-z0-9]+)')
ulk = match1(html, r'var ulk="(.+)"')
# here s the parser...
stream_types = dilidili_parser_data_to_stream_types(typ, vid, hd2, sign, tmsign, ulk)
#get best
best_id = max([i['id'] for i in stream_types])
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = best_id, sign = sign, tmsign = tmsign, ulk = ulk)
ckplayer_download(parse_url, output_dir, merge, info_only, is_xml = True, title = title, headers = headers)
#type_ = ''
#size = 0
#type_, ext, size = url_info(url)
#print_info(site_info, title, type_, size)
#if not info_only:
#download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
site_info = "dilidili"
download = dilidili_download
download_playlist = playlist_not_supported('dilidili')

View File

@ -1,55 +0,0 @@
# -*- coding: utf-8 -*-
__all__ = ['dongting_download']
from ..common import *
_unit_prefixes = 'bkmg'
def parse_size(size):
m = re.match(r'([\d.]+)(.(?:i?B)?)', size, re.I)
if m:
return int(float(m.group(1)) * 1024 **
_unit_prefixes.index(m.group(2).lower()))
else:
return 0
def dongting_download_lyric(lrc_url, file_name, output_dir):
j = get_html(lrc_url)
info = json.loads(j)
lrc = j['data']['lrc']
filename = get_filename(file_name)
with open(output_dir + "/" + filename + '.lrc', 'w', encoding='utf-8') as x:
x.write(lrc)
def dongting_download_song(sid, output_dir = '.', merge = True, info_only = False):
j = get_html('http://ting.hotchanson.com/detail.do?neid=%s&size=0' % sid)
info = json.loads(j)
song_title = info['data']['songName']
album_name = info['data']['albumName']
artist = info['data']['singerName']
ext = 'mp3'
size = parse_size(info['data']['itemList'][-1]['size'])
url = info['data']['itemList'][-1]['downUrl']
print_info(site_info, song_title, ext, size)
if not info_only:
file_name = "%s - %s - %s" % (song_title, album_name, artist)
download_urls([url], file_name, ext, size, output_dir, merge = merge)
lrc_url = ('http://lp.music.ttpod.com/lrc/down?'
'lrcid=&artist=%s&title=%s') % (
parse.quote(artist), parse.quote(song_title))
try:
dongting_download_lyric(lrc_url, file_name, output_dir)
except:
pass
def dongting_download(url, output_dir = '.', stream_type = None, merge = True, info_only = False, **kwargs):
if re.match('http://www.dongting.com/\?song_id=\d+', url):
id = r1(r'http://www.dongting.com/\?song_id=(\d+)', url)
dongting_download_song(id, output_dir, merge, info_only)
site_info = "Dongting.com"
download = dongting_download
download_playlist = playlist_not_supported("dongting")

View File

@ -7,7 +7,18 @@ from ..common import *
def douban_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
html = get_html(url)
if 'subject' in url:
if re.match(r'https?://movie', url):
title = match1(html, 'name="description" content="([^"]+)')
tid = match1(url, 'trailer/(\d+)')
real_url = 'https://movie.douban.com/trailer/video_url?tid=%s' % tid
type, ext, size = url_info(real_url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
elif 'subject' in url:
titles = re.findall(r'data-title="([^"]*)">', html)
song_id = re.findall(r'<li class="song-item" id="([^"]*)"', html)
song_ssid = re.findall(r'data-ssid="([^"]*)"', html)

View File

@ -0,0 +1,46 @@
# coding=utf-8
import re
import json
from ..common import (
url_size,
print_info,
get_content,
fake_headers,
download_urls,
playlist_not_supported,
)
__all__ = ['douyin_download_by_url']
def douyin_download_by_url(url, **kwargs):
page_content = get_content(url, headers=fake_headers)
match_rule = re.compile(r'var data = \[(.*?)\];')
video_info = json.loads(match_rule.findall(page_content)[0])
video_url = video_info['video']['play_addr']['url_list'][0]
# fix: https://www.douyin.com/share/video/6553248251821165832
# if there is no title, use desc
cha_list = video_info['cha_list']
if cha_list:
title = cha_list[0]['cha_name']
else:
title = video_info['desc']
video_format = 'mp4'
size = url_size(video_url, faker=True)
print_info(
site_info='douyin.com', title=title,
type=video_format, size=size
)
if not kwargs['info_only']:
download_urls(
urls=[video_url], title=title, ext=video_format, total_size=size,
faker=True,
**kwargs
)
download = douyin_download_by_url
download_playlist = playlist_not_supported('douyin')

View File

@ -3,53 +3,79 @@
__all__ = ['douyutv_download']
from ..common import *
from ..util.log import *
import json
import hashlib
import time
import uuid
import urllib.parse, urllib.request
import re
headers = {
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
}
def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
ep = 'http://vmobile.douyu.com/video/getInfo?vid='
patt = r'show/([0-9A-Za-z]+)'
title_patt = r'<h1>(.+?)</h1>'
hit = re.search(patt, url)
if hit is None:
log.wtf('Unknown url pattern')
vid = hit.group(1)
page = get_content(url, headers=headers)
hit = re.search(title_patt, page)
if hit is None:
title = vid
else:
title = hit.group(1)
meta = json.loads(get_content(ep + vid))
if meta['error'] != 0:
log.wtf('Error from API server')
m3u8_url = meta['data']['video_url']
print_info('Douyu Video', title, 'm3u8', 0, m3u8_url=m3u8_url)
if not info_only:
urls = general_m3u8_extractor(m3u8_url)
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
def douyutv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_content(url)
room_id_patt = r'"room_id"\s*:\s*(\d+),'
if 'v.douyu.com/show/' in url:
douyutv_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
url = re.sub(r'.*douyu.com','https://m.douyu.com/room', url)
html = get_content(url, headers)
room_id_patt = r'"rid"\s*:\s*(\d+),'
room_id = match1(html, room_id_patt)
if room_id == "0":
room_id = url[url.rfind('/') + 1:]
json_request_url = "http://m.douyu.com/html5/live?roomId=%s" % room_id
content = get_content(json_request_url)
data = json.loads(content)['data']
server_status = data.get('error',0)
if server_status is not 0:
api_url = "http://www.douyutv.com/api/v1/"
args = "room/%s?aid=wp&client_sys=wp&time=%d" % (room_id, int(time.time()))
auth_md5 = (args + "zNzMV1y4EMxOHS6I5WKm").encode("utf-8")
auth_str = hashlib.md5(auth_md5).hexdigest()
json_request_url = "%s%s&auth=%s" % (api_url, args, auth_str)
content = get_content(json_request_url, headers)
json_content = json.loads(content)
data = json_content['data']
server_status = json_content.get('error', 0)
if server_status != 0:
raise ValueError("Server returned error:%s" % server_status)
title = data.get('room_name')
show_status = data.get('show_status')
if show_status is not "1":
if show_status != "1":
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
tt = int(time.time() / 60)
did = uuid.uuid4().hex.upper()
sign_content = '{room_id}{did}A12Svb&%1UUmf@hC{tt}'.format(room_id = room_id, did = did, tt = tt)
sign = hashlib.md5(sign_content.encode('utf-8')).hexdigest()
json_request_url = "http://www.douyu.com/lapi/live/getPlay/%s" % room_id
payload = {'cdn': 'ws', 'rate': '0', 'tt': tt, 'did': did, 'sign': sign}
postdata = urllib.parse.urlencode(payload)
req = urllib.request.Request(json_request_url, postdata.encode('utf-8'))
with urllib.request.urlopen(req) as response:
content = response.read()
data = json.loads(content.decode('utf-8'))['data']
server_status = data.get('error',0)
if server_status is not 0:
raise ValueError("Server returned error:%s" % server_status)
real_url = data.get('rtmp_url') + '/' + data.get('rtmp_live')
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_url_ffmpeg(real_url, title, 'flv', None, output_dir = output_dir, merge = merge)
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
site_info = "douyu.com"
download = douyutv_download

View File

@ -1,7 +1,11 @@
__all__ = ['embed_download']
import urllib.parse
from ..common import *
from .bilibili import bilibili_download
from .dailymotion import dailymotion_download
from .iqiyi import iqiyi_download_by_vid
from .le import letvcloud_download_by_vu
from .netease import netease_download
@ -11,6 +15,8 @@ from .tudou import tudou_download_by_id
from .vimeo import vimeo_download_by_id
from .yinyuetai import yinyuetai_download_by_id
from .youku import youku_download_by_vid
from . import iqiyi
from . import bokecc
"""
refer to http://open.youku.com/tools
@ -25,7 +31,7 @@ youku_embed_patterns = [ 'youku\.com/v_show/id_([a-zA-Z0-9=]+)',
"""
http://www.tudou.com/programs/view/html5embed.action?type=0&amp;code=3LS_URGvl54&amp;lcode=&amp;resourceId=0_06_05_99
"""
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_]+)\&',
tudou_embed_patterns = [ 'tudou\.com[a-zA-Z0-9\/\?=\&\.\;]+code=([a-zA-Z0-9_-]+)\&',
'www\.tudou\.com/v/([a-zA-Z0-9_-]+)/[^"]*v\.swf'
]
@ -42,6 +48,24 @@ netease_embed_patterns = [ '(http://\w+\.163\.com/movie/[^\'"]+)' ]
vimeo_embed_patters = [ 'player\.vimeo\.com/video/(\d+)' ]
dailymotion_embed_patterns = [ 'www\.dailymotion\.com/embed/video/(\w+)' ]
"""
check the share button on http://www.bilibili.com/video/av5079467/
"""
bilibili_embed_patterns = [ 'static\.hdslb\.com/miniloader\.swf.*aid=(\d+)' ]
'''
http://open.iqiyi.com/lib/player.html
'''
iqiyi_patterns = [r'(?:\"|\')(https?://dispatcher\.video\.qiyi\.com\/disp\/shareplayer\.swf\?.+?)(?:\"|\')',
r'(?:\"|\')(https?://open\.iqiyi\.com\/developer\/player_js\/coopPlayerIndex\.html\?.+?)(?:\"|\')']
bokecc_patterns = [r'bokecc\.com/flash/pocle/player\.swf\?siteid=(.+?)&vid=(.{32})']
recur_limit = 3
def embed_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
content = get_content(url, headers=fake_headers)
@ -51,35 +75,78 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
vids = matchall(content, youku_embed_patterns)
for vid in set(vids):
found = True
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, tudou_embed_patterns)
for vid in set(vids):
found = True
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, yinyuetai_embed_patterns)
for vid in vids:
found = True
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
vids = matchall(content, iqiyi_embed_patterns)
for vid in vids:
found = True
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only)
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
urls = matchall(content, netease_embed_patterns)
for url in urls:
found = True
netease_download(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
urls = matchall(content, vimeo_embed_patters)
for url in urls:
found = True
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url, **kwargs)
if not found:
urls = matchall(content, dailymotion_embed_patterns)
for url in urls:
found = True
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
aids = matchall(content, bilibili_embed_patterns)
for aid in aids:
found = True
url = 'http://www.bilibili.com/video/av%s/' % aid
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
iqiyi_urls = matchall(content, iqiyi_patterns)
for url in iqiyi_urls:
found = True
iqiyi.download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
bokecc_metas = matchall(content, bokecc_patterns)
for meta in bokecc_metas:
found = True
bokecc.bokecc_download_by_id(meta[1], output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
if found:
return True
# Try harder, check all iframes
if 'recur_lv' not in kwargs or kwargs['recur_lv'] < recur_limit:
r = kwargs.get('recur_lv')
if r is None:
r = 1
else:
r += 1
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.*?)(?:\"|\')'])
for iframe in iframes:
if not iframe.startswith('http'):
src = urllib.parse.urljoin(url, iframe)
else:
src = iframe
found = embed_download(src, output_dir=output_dir, merge=merge, info_only=info_only, recur_lv=r, **kwargs)
if found:
return True
if not found and 'recur_lv' not in kwargs:
raise NotImplementedError(url)
else:
return found
site_info = "any.any"
download = embed_download

View File

@ -6,16 +6,21 @@ from ..common import *
import json
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
url = re.sub(r'//.*?facebook.com','//facebook.com',url)
html = get_html(url)
title = r1(r'<title id="pageTitle">(.+)</title>', html)
if title is None:
title = url
sd_urls = list(set([
unicodize(str.replace(i, '\\/', '/'))
for i in re.findall(r'"sd_src_no_ratelimit":"([^"]*)"', html)
for i in re.findall(r'sd_src_no_ratelimit:"([^"]*)"', html)
]))
hd_urls = list(set([
unicodize(str.replace(i, '\\/', '/'))
for i in re.findall(r'"hd_src_no_ratelimit":"([^"]*)"', html)
for i in re.findall(r'hd_src_no_ratelimit:"([^"]*)"', html)
]))
urls = hd_urls if hd_urls else sd_urls

View File

@ -1,39 +1,228 @@
#!/usr/bin/env python
__all__ = ['flickr_download']
__all__ = ['flickr_download_main']
from ..common import *
def flickr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
page = get_html(url)
title = match1(page, r'<meta property="og:title" content="([^"]*)"')
photo_id = match1(page, r'"id":"([0-9]+)"')
import json
try: # extract video
html = get_html('https://secure.flickr.com/apps/video/video_mtl_xml.gne?photo_id=%s' % photo_id)
node_id = match1(html, r'<Item id="id">(.+)</Item>')
secret = match1(html, r'<Item id="photo_secret">(.+)</Item>')
pattern_url_photoset = r'https?://www\.flickr\.com/photos/.+/(?:(?:sets)|(?:albums))?/([^/]+)'
pattern_url_photostream = r'https?://www\.flickr\.com/photos/([^/]+)(?:/|(?:/page))?$'
pattern_url_single_photo = r'https?://www\.flickr\.com/photos/[^/]+/(\d+)'
pattern_url_gallery = r'https?://www\.flickr\.com/photos/[^/]+/galleries/(\d+)'
pattern_url_group = r'https?://www\.flickr\.com/groups/([^/]+)'
pattern_url_favorite = r'https?://www\.flickr\.com/photos/([^/]+)/favorites'
html = get_html('https://secure.flickr.com/video_playlist.gne?node_id=%s&secret=%s' % (node_id, secret))
app = match1(html, r'APP="([^"]+)"')
fullpath = unescape_html(match1(html, r'FULLPATH="([^"]+)"'))
url = app + fullpath
pattern_inline_title = r'<title>([^<]*)</title>'
pattern_inline_api_key = r'api\.site_key\s*=\s*"([^"]+)"'
pattern_inline_img_url = r'"url":"([^"]+)","key":"[^"]+"}}'
pattern_inline_NSID = r'"nsid"\s*:\s*"([^"]+)"'
pattern_inline_video_mark = r'("mediaType":"video")'
# (api_key, method, ext, page)
tmpl_api_call = (
'https://api.flickr.com/services/rest?'
'&format=json&nojsoncallback=1'
# UNCOMMENT FOR TESTING
#'&per_page=5'
'&per_page=500'
# this parameter CANNOT take control of 'flickr.galleries.getPhotos'
# though the doc said it should.
# it's always considered to be 500
'&api_key=%s'
'&method=flickr.%s'
'&extras=url_sq,url_q,url_t,url_s,url_n,url_m,url_z,url_c,url_l,url_h,url_k,url_o,media'
'%s&page=%d'
)
tmpl_api_call_video_info = (
'https://api.flickr.com/services/rest?'
'&format=json&nojsoncallback=1'
'&method=flickr.video.getStreamInfo'
'&api_key=%s'
'&photo_id=%s'
'&secret=%s'
)
tmpl_api_call_photo_info = (
'https://api.flickr.com/services/rest?'
'&format=json&nojsoncallback=1'
'&method=flickr.photos.getInfo'
'&api_key=%s'
'&photo_id=%s'
)
# looks that flickr won't return urls for all sizes
# we required in 'extras field without a acceptable header
dummy_header = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0'
}
def get_content_headered(url):
return get_content(url, dummy_header)
def get_photoset_id(url, page):
return match1(url, pattern_url_photoset)
def get_photo_id(url, page):
return match1(url, pattern_url_single_photo)
def get_gallery_id(url, page):
return match1(url, pattern_url_gallery)
def get_api_key(page):
match = match1(page, pattern_inline_api_key)
# this happens only when the url points to a gallery page
# that contains no inline api_key(and never makes xhr api calls)
# in fact this might be a better approch for getting a temporary api key
# since there's no place for a user to add custom information that may
# misguide the regex in the homepage
if not match:
return match1(get_html('https://flickr.com'), pattern_inline_api_key)
return match
def get_NSID(url, page):
return match1(page, pattern_inline_NSID)
# [
# (
# regex_match_url,
# remote_api_method,
# additional_query_parameter_for_method,
# parser_for_additional_parameter,
# field_where_photourls_are_saved
# )
# ]
url_patterns = [
# www.flickr.com/photos/{username|NSID}/sets|albums/{album-id}
(
pattern_url_photoset,
'photosets.getPhotos',
'photoset_id',
get_photoset_id,
'photoset'
),
# www.flickr.com/photos/{username|NSID}/{pageN}?
(
pattern_url_photostream,
# according to flickr api documentation, this method needs to be
# authenticated in order to filter photo visible to the calling user
# but it seems works fine anonymously as well
'people.getPhotos',
'user_id',
get_NSID,
'photos'
),
# www.flickr.com/photos/{username|NSID}/galleries/{gallery-id}
(
pattern_url_gallery,
'galleries.getPhotos',
'gallery_id',
get_gallery_id,
'photos'
),
# www.flickr.com/groups/{groupname|groupNSID}/
(
pattern_url_group,
'groups.pools.getPhotos',
'group_id',
get_NSID,
'photos'
),
# www.flickr.com/photos/{username|NSID}/favorites/*
(
pattern_url_favorite,
'favorites.getList',
'user_id',
get_NSID,
'photos'
)
]
def flickr_download_main(url, output_dir = '.', merge = False, info_only = False, **kwargs):
urls = None
size = 'o' # works for collections only
title = None
if 'stream_id' in kwargs:
size = kwargs['stream_id']
if match1(url, pattern_url_single_photo):
url, title = get_single_photo_url(url)
urls = [url]
else:
urls, title = fetch_photo_url_list(url, size)
index = 0
for url in urls:
mime, ext, size = url_info(url)
print_info(site_info, title, mime, size)
print_info('Flickr.com', title, mime, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
suffix = '[%d]' % index
download_urls([url], title + suffix, ext, False, output_dir, None, False, False)
index = index + 1
except: # extract images
image = match1(page, r'<meta property="og:image" content="([^"]*)')
ext = 'jpg'
_, _, size = url_info(image)
def fetch_photo_url_list(url, size):
for pattern in url_patterns:
# FIXME: fix multiple matching since the match group is dropped
if match1(url, pattern[0]):
return fetch_photo_url_list_impl(url, size, *pattern[1:])
raise NotImplementedError('Flickr extractor is not supported for %s.' % url)
print_info(site_info, title, ext, size)
if not info_only:
download_urls([image], title, ext, size, output_dir, merge=merge)
def fetch_photo_url_list_impl(url, size, method, id_field, id_parse_func, collection_name):
page = get_html(url)
api_key = get_api_key(page)
ext_field = ''
if id_parse_func:
ext_field = '&%s=%s' % (id_field, id_parse_func(url, page))
page_number = 1
urls = []
while True:
call_url = tmpl_api_call % (api_key, method, ext_field, page_number)
photoset = json.loads(get_content_headered(call_url))[collection_name]
pagen = photoset['page']
pages = photoset['pages']
for info in photoset['photo']:
url = get_url_of_largest(info, api_key, size)
urls.append(url)
page_number = page_number + 1
# the typeof 'page' and 'pages' may change in different methods
if str(pagen) == str(pages):
break
return urls, match1(page, pattern_inline_title)
# image size suffixes used in inline json 'key' field
# listed in descending order
size_suffixes = ['o', 'k', 'h', 'l', 'c', 'z', 'm', 'n', 's', 't', 'q', 'sq']
def get_orig_video_source(api_key, pid, secret):
parsed = json.loads(get_content_headered(tmpl_api_call_video_info % (api_key, pid, secret)))
for stream in parsed['streams']['stream']:
if stream['type'] == 'orig':
return stream['_content'].replace('\\', '')
return None
def get_url_of_largest(info, api_key, size):
if info['media'] == 'photo':
sizes = size_suffixes
if size in sizes:
sizes = sizes[sizes.index(size):]
for suffix in sizes:
if 'url_' + suffix in info:
return info['url_' + suffix].replace('\\', '')
return None
else:
return get_orig_video_source(api_key, info['id'], info['secret'])
def get_single_photo_url(url):
page = get_html(url)
pid = get_photo_id(url, page)
title = match1(page, pattern_inline_title)
if match1(page, pattern_inline_video_mark):
api_key = get_api_key(page)
reply = get_content(tmpl_api_call_photo_info % (api_key, get_photo_id(url, page)))
secret = json.loads(reply)['photo']['secret']
return get_orig_video_source(api_key, pid, secret), title
#last match always has the best resolution
match = match1(page, pattern_inline_img_url)
return 'https:' + match.replace('\\', ''), title
site_info = "Flickr.com"
download = flickr_download
download_playlist = playlist_not_supported('flickr')
download = flickr_download_main
download_playlist = playlist_not_supported('flickr');

View File

@ -1,150 +1,223 @@
#!/usr/bin/env python
import json
import urllib.parse
import base64
import binascii
import re
from ..extractors import VideoExtractor
from ..util import log
from ..common import get_content, playlist_not_supported
__all__ = ['funshion_download']
from ..common import *
import urllib.error
import json
#----------------------------------------------------------------------
def funshion_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
""""""
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url): #single video
funshion_download_by_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url): #whole drama
funshion_download_by_drama_url(url, output_dir=output_dir, merge=merge, info_only=info_only)
class KBaseMapping:
def __init__(self, base=62):
self.base = base
mapping_table = [str(num) for num in range(10)]
for i in range(26):
mapping_table.append(chr(i + ord('a')))
for i in range(26):
mapping_table.append(chr(i + ord('A')))
self.mapping_table = mapping_table[:self.base]
def mapping(self, num):
res = []
while num > 0:
res.append(self.mapping_table[num % self.base])
num = num // self.base
return ''.join(res[::-1])
class Funshion(VideoExtractor):
name = "funshion"
stream_types = [
{'id': 'sdvd'},
{'id': 'sdvd_h265'},
{'id': 'hd'},
{'id': 'hd_h265'},
{'id': 'dvd'},
{'id': 'dvd_h265'},
{'id': 'tv'},
{'id': 'tv_h265'}
]
a_mobile_url = 'http://m.fun.tv/implay/?mid=302555'
video_ep = 'http://pv.funshion.com/v7/video/play/?id={}&cl=mweb&uc=111'
media_ep = 'http://pm.funshion.com/v7/media/play/?id={}&cl=mweb&uc=111'
coeff = None
@classmethod
def fetch_magic(cls, url):
def search_dict(a_dict, target):
for key, val in a_dict.items():
if val == target:
return key
magic_list = []
page = get_content(url)
src = re.findall(r'src="(.+?)"', page)
js = [path for path in src if path.endswith('.js')]
host = 'http://' + urllib.parse.urlparse(url).netloc
js_path = [urllib.parse.urljoin(host, rel_path) for rel_path in js]
for p in js_path:
if 'mtool' in p or 'mcore' in p:
js_text = get_content(p)
hit = re.search(r'\(\'(.+?)\',(\d+),(\d+),\'(.+?)\'\.split\(\'\|\'\),\d+,\{\}\)', js_text)
code = hit.group(1)
base = hit.group(2)
size = hit.group(3)
names = hit.group(4).split('|')
mapping = KBaseMapping(base=int(base))
sym_to_name = {}
for no in range(int(size), 0, -1):
no_in_base = mapping.mapping(no)
val = names[no] if no < len(names) and names[no] else no_in_base
sym_to_name[no_in_base] = val
moz_ec_name = search_dict(sym_to_name, 'mozEcName')
push = search_dict(sym_to_name, 'push')
patt = '{}\.{}\("(.+?)"\)'.format(moz_ec_name, push)
ec_list = re.findall(patt, code)
[magic_list.append(sym_to_name[ec]) for ec in ec_list]
return magic_list
@classmethod
def get_coeff(cls, magic_list):
magic_set = set(magic_list)
no_dup = []
for item in magic_list:
if item in magic_set:
magic_set.remove(item)
no_dup.append(item)
# really necessary?
coeff = [0, 0, 0, 0]
for num_pair in no_dup:
idx = int(num_pair[-1])
val = int(num_pair[:-1], 16)
coeff[idx] = val
return coeff
@classmethod
def funshion_decrypt(cls, a_bytes, coeff):
res_list = []
pos = 0
while pos < len(a_bytes):
a = a_bytes[pos]
if pos == len(a_bytes) - 1:
res_list.append(a)
pos += 1
else:
return
b = a_bytes[pos + 1]
m = a * coeff[0] + b * coeff[2]
n = a * coeff[1] + b * coeff[3]
res_list.append(m & 0xff)
res_list.append(n & 0xff)
pos += 2
return bytes(res_list).decode('utf8')
# Logics for single video until drama
#----------------------------------------------------------------------
def funshion_download_by_url(url, output_dir = '.', merge = False, info_only = False):
"""lots of stuff->None
Main wrapper for single video download.
"""
@classmethod
def funshion_decrypt_str(cls, a_str, coeff):
# r'.{27}0' pattern, untested
if len(a_str) == 28 and a_str[-1] == '0':
data_bytes = base64.b64decode(a_str[:27] + '=')
clear = cls.funshion_decrypt(data_bytes, coeff)
return binascii.hexlify(clear.encode('utf8')).upper()
data_bytes = base64.b64decode(a_str[2:])
return cls.funshion_decrypt(data_bytes, coeff)
@classmethod
def checksum(cls, sha1_str):
if len(sha1_str) != 41:
return False
if not re.match(r'[0-9A-Za-z]{41}', sha1_str):
return False
sha1 = sha1_str[:-1]
if (15 & sum([int(char, 16) for char in sha1])) == int(sha1_str[-1], 16):
return True
return False
@classmethod
def get_cdninfo(cls, hashid):
url = 'http://jobsfe.funshion.com/query/v1/mp4/{}.json'.format(hashid)
meta = json.loads(get_content(url, decoded=False).decode('utf8'))
return meta['playlist'][0]['urls']
@classmethod
def dec_playinfo(cls, info, coeff):
res = None
clear = cls.funshion_decrypt_str(info['infohash'], coeff)
if cls.checksum(clear):
res = dict(hashid=clear[:40], token=cls.funshion_decrypt_str(info['token'], coeff))
else:
clear = cls.funshion_decrypt_str(info['infohash_prev'], coeff)
if cls.checksum(clear):
res = dict(hashid=clear[:40], token=cls.funshion_decrypt_str(info['token_prev'], coeff))
return res
def prepare(self, **kwargs):
if self.__class__.coeff is None:
magic_list = self.__class__.fetch_magic(self.__class__.a_mobile_url)
self.__class__.coeff = self.__class__.get_coeff(magic_list)
if 'title' not in kwargs:
url = 'http://pv.funshion.com/v5/video/profile/?id={}&cl=mweb&uc=111'.format(self.vid)
meta = json.loads(get_content(url))
self.title = meta['name']
else:
self.title = kwargs['title']
ep_url = self.__class__.video_ep if 'single_video' in kwargs else self.__class__.media_ep
url = ep_url.format(self.vid)
meta = json.loads(get_content(url))
streams = meta['playlist']
for stream in streams:
definition = stream['code']
for s in stream['playinfo']:
codec = 'h' + s['codec'][2:]
# h.264 -> h264
for st in self.__class__.stream_types:
s_id = '{}_{}'.format(definition, codec)
if codec == 'h264':
s_id = definition
if s_id == st['id']:
clear_info = self.__class__.dec_playinfo(s, self.__class__.coeff)
cdn_list = self.__class__.get_cdninfo(clear_info['hashid'])
base_url = cdn_list[0]
vf = urllib.parse.quote(s['vf'])
video_size = int(s['filesize'])
token = urllib.parse.quote(base64.b64encode(clear_info['token'].encode('utf8')))
video_url = '{}?token={}&vf={}'.format(base_url, token, vf)
self.streams[s_id] = dict(size=video_size, src=[video_url], container='mp4')
def funshion_download(url, **kwargs):
if re.match(r'http://www.fun.tv/vplay/v-(\w+)', url):
match = re.search(r'http://www.fun.tv/vplay/v-(\d+)(.?)', url)
vid = match.group(1)
funshion_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
vid = re.search(r'http://www.fun.tv/vplay/v-(\w+)', url).group(1)
Funshion().download_by_vid(vid, single_video=True, **kwargs)
elif re.match(r'http://www.fun.tv/vplay/.*g-(\w+)', url):
epid = re.search(r'http://www.fun.tv/vplay/.*g-(\w+)', url).group(1)
url = 'http://pm.funshion.com/v5/media/episode?id={}&cl=mweb&uc=111'.format(epid)
meta = json.loads(get_content(url))
drama_name = meta['name']
#----------------------------------------------------------------------
def funshion_download_by_vid(vid, output_dir = '.', merge = False, info_only = False):
"""vid->None
Secondary wrapper for single video download.
"""
title = funshion_get_title_by_vid(vid)
url_list = funshion_vid_to_urls(vid)
for url in url_list:
type, ext, size = url_info(url)
print_info(site_info, title, type, size)
if not info_only:
download_urls(url_list, title, ext, total_size=None, output_dir=output_dir, merge=merge)
#----------------------------------------------------------------------
def funshion_get_title_by_vid(vid):
"""vid->str
Single video vid to title."""
html = get_content('http://pv.funshion.com/v5/video/profile?id={vid}&cl=aphone&uc=5'.format(vid = vid))
c = json.loads(html)
return c['name']
#----------------------------------------------------------------------
def funshion_vid_to_urls(vid):
"""str->str
Select one resolution for single video download."""
html = get_content('http://pv.funshion.com/v5/video/play/?id={vid}&cl=aphone&uc=5'.format(vid = vid))
return select_url_from_video_api(html)
#Logics for drama until helper functions
#----------------------------------------------------------------------
def funshion_download_by_drama_url(url, output_dir = '.', merge = False, info_only = False):
"""str->None
url = 'http://www.fun.tv/vplay/g-95785/'
"""
id = r1(r'http://www.fun.tv/vplay/.*g-(\d+)', url)
video_list = funshion_drama_id_to_vid(id)
for video in video_list:
funshion_download_by_id((video[0], id), output_dir=output_dir, merge=merge, info_only=info_only)
# id is for drama, vid not the same as the ones used in single video
#----------------------------------------------------------------------
def funshion_download_by_id(vid_id_tuple, output_dir = '.', merge = False, info_only = False):
"""single_episode_id, drama_id->None
Secondary wrapper for single drama video download.
"""
(vid, id) = vid_id_tuple
title = funshion_get_title_by_id(vid, id)
url_list = funshion_id_to_urls(vid)
for url in url_list:
type, ext, size = url_info(url)
print_info(site_info, title, type, size)
if not info_only:
download_urls(url_list, title, ext, total_size=None, output_dir=output_dir, merge=merge)
#----------------------------------------------------------------------
def funshion_drama_id_to_vid(episode_id):
"""int->[(int,int),...]
id: 95785
->[('626464', '1'), ('626466', '2'), ('626468', '3'),...
Drama ID to vids used in drama.
**THIS VID IS NOT THE SAME WITH THE ONES USED IN SINGLE VIDEO!!**
"""
html = get_content('http://pm.funshion.com/v5/media/episode?id={episode_id}&cl=aphone&uc=5'.format(episode_id = episode_id))
c = json.loads(html)
#{'definition': [{'name': '流畅', 'code': 'tv'}, {'name': '标清', 'code': 'dvd'}, {'name': '高清', 'code': 'hd'}], 'retmsg': 'ok', 'total': '32', 'sort': '1', 'prevues': [], 'retcode': '200', 'cid': '2', 'template': 'grid', 'episodes': [{'num': '1', 'id': '624728', 'still': None, 'name': '第1集', 'duration': '45:55'}, ], 'name': '太行山上', 'share': 'http://pm.funshion.com/v5/media/share?id=201554&num=', 'media': '201554'}
return [(i['id'], i['num']) for i in c['episodes']]
#----------------------------------------------------------------------
def funshion_id_to_urls(id):
"""int->list of URL
Select video URL for single drama video.
"""
html = get_content('http://pm.funshion.com/v5/media/play/?id={id}&cl=aphone&uc=5'.format(id = id))
return select_url_from_video_api(html)
#----------------------------------------------------------------------
def funshion_get_title_by_id(single_episode_id, drama_id):
"""single_episode_id, drama_id->str
This is for full drama.
Get title for single drama video."""
html = get_content('http://pm.funshion.com/v5/media/episode?id={id}&cl=aphone&uc=5'.format(id = drama_id))
c = json.loads(html)
for i in c['episodes']:
if i['id'] == str(single_episode_id):
return c['name'] + ' - ' + i['name']
# Helper functions.
#----------------------------------------------------------------------
def select_url_from_video_api(html):
"""str(html)->str(url)
Choose the best one.
Used in both single and drama download.
code definition:
{'tv': 'liuchang',
'dvd': 'biaoqing',
'hd': 'gaoqing',
'sdvd': 'chaoqing'}"""
c = json.loads(html)
#{'retmsg': 'ok', 'retcode': '200', 'selected': 'tv', 'mp4': [{'filename': '', 'http': 'http://jobsfe.funshion.com/query/v1/mp4/7FCD71C58EBD4336DF99787A63045A8F3016EC51.json', 'filesize': '96748671', 'code': 'tv', 'name': '流畅', 'infohash': '7FCD71C58EBD4336DF99787A63045A8F3016EC51'}...], 'episode': '626464'}
video_dic = {}
for i in c['mp4']:
video_dic[i['code']] = i['http']
quality_preference_list = ['sdvd', 'hd', 'dvd', 'sd']
url = [video_dic[quality] for quality in quality_preference_list if quality in video_dic][0]
html = get_html(url)
c = json.loads(html)
#'{"return":"succ","client":{"ip":"107.191.**.**","sp":"0","loc":"0"},"playlist":[{"bits":"1638400","tname":"dvd","size":"555811243","urls":["http:\\/\\/61.155.217.4:80\\/play\\/1E070CE31DAA1373B667FD23AA5397C192CA6F7F.mp4",...]}]}'
return [i['urls'][0] for i in c['playlist']]
extractor = Funshion()
for ep in meta['episodes']:
title = '{}_{}_{}'.format(drama_name, ep['num'], ep['name'])
extractor.download_by_vid(ep['id'], title=title, **kwargs)
else:
log.wtf('Unknown url pattern')
site_info = "funshion"
download = funshion_download

View File

@ -0,0 +1,33 @@
#!/usr/bin/env python
__all__ = ['giphy_download']
from ..common import *
import json
def giphy_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
url = list(set([
unicodize(str.replace(i, '\\/', '/'))
for i in re.findall(r'<meta property="og:video:secure_url" content="(.*?)">', html)
]))
title = r1(r'<meta property="og:title" content="(.*?)">', html)
if title is None:
title = url[0]
type, ext, size = url_info(url[0], True)
size = urls_size(url)
type = "video/mp4"
ext = "mp4"
print_info(site_info, title, type, size)
if not info_only:
download_urls(url, title, ext, size, output_dir, merge=False)
site_info = "Giphy.com"
download = giphy_download
download_playlist = playlist_not_supported('giphy')

View File

@ -51,7 +51,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
# attempt to extract images first
# TBD: posts with > 4 images
# TBD: album links
html = get_html(parse.unquote(url))
html = get_html(parse.unquote(url), faker=True)
real_urls = []
for src in re.findall(r'src="([^"]+)"[^>]*itemprop="image"', html):
t = src.split('/')
@ -59,14 +59,15 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
u = '/'.join(t)
real_urls.append(u)
if not real_urls:
real_urls = [r1(r'<meta property="og:image" content="([^"]+)', html)]
post_date = r1(r'"(20\d\d-[01]\d-[0123]\d)"', html)
real_urls = re.findall(r'<meta property="og:image" content="([^"]+)', html)
real_urls = [re.sub(r'w\d+-h\d+-p', 's0', u) for u in real_urls]
post_date = r1(r'"?(20\d\d[-/]?[01]\d[-/]?[0123]\d)"?', html)
post_id = r1(r'/posts/([^"]+)', html)
title = post_date + "_" + post_id
try:
url = "https://plus.google.com/" + r1(r'"(photos/\d+/albums/\d+/\d+)', html)
html = get_html(url)
url = "https://plus.google.com/" + r1(r'(photos/\d+/albums/\d+/\d+)\?authkey', html)
html = get_html(url, faker=True)
temp = re.findall(r'\[(\d+),\d+,\d+,"([^"]+)"\]', html)
temp = sorted(temp, key = lambda x : fmt_level[x[0]])
urls = [unicodize(i[1]) for i in temp if i[0] == temp[0][0]]
@ -77,7 +78,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
post_author = r1(r'/\+([^/]+)/posts', post_url)
if post_author:
post_url = "https://plus.google.com/+%s/posts/%s" % (parse.quote(post_author), r1(r'posts/(.+)', post_url))
post_html = get_html(post_url)
post_html = get_html(post_url, faker=True)
title = r1(r'<title[^>]*>([^<\n]+)', post_html)
if title is None:
@ -98,20 +99,34 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
elif service in ['docs', 'drive'] : # Google Docs
html = get_html(url)
html = get_content(url, headers=fake_headers)
title = r1(r'"title":"([^"]*)"', html) or r1(r'<meta itemprop="name" content="([^"]*)"', html)
if len(title.split('.')) > 1:
title = ".".join(title.split('.')[:-1])
docid = r1(r'"docid":"([^"]*)"', html)
docid = r1('/file/d/([^/]+)', url)
request.install_opener(request.build_opener(request.HTTPCookieProcessor()))
request.urlopen(request.Request("https://docs.google.com/uc?id=%s&export=download" % docid))
real_url = "https://docs.google.com/uc?export=download&confirm=no_antivirus&id=%s" % docid
type, ext, size = url_info(real_url)
redirected_url = get_location(real_url)
if real_url != redirected_url:
# tiny file - get real url here
type, ext, size = url_info(redirected_url)
real_url = redirected_url
else:
# huge file - the real_url is a confirm page and real url is in it
confirm_page = get_content(real_url)
hrefs = re.findall(r'href="(.+?)"', confirm_page)
for u in hrefs:
if u.startswith('/uc?export=download'):
rel = unescape_html(u)
confirm_url = 'https://docs.google.com' + rel
real_url = get_location(confirm_url)
_, ext, size = url_info(real_url, headers=fake_headers)
if size is None:
size = 0
print_info(site_info, title, ext, size)
if not info_only:

View File

@ -1,85 +0,0 @@
#!/usr/bin/env python
import json
import os
import re
import math
import traceback
import urllib.parse as urlparse
from ..common import *
__all__ = ['huaban_download']
site_info = '花瓣 (Huaban)'
LIMIT = 100
class Board:
def __init__(self, title, pins):
self.title = title
self.pins = pins
self.pin_count = len(pins)
class Pin:
host = 'http://img.hb.aicdn.com/'
def __init__(self, pin_json):
img_file = pin_json['file']
self.id = str(pin_json['pin_id'])
self.url = urlparse.urljoin(self.host, img_file['key'])
self.ext = img_file['type'].split('/')[-1]
def construct_url(url, **params):
param_str = urlparse.urlencode(params)
return url + '?' + param_str
def extract_json_data(url, **params):
url = construct_url(url, **params)
html = get_content(url, headers=fake_headers)
json_string = match1(html, r'app.page\["board"\] = (.*?});')
json_data = json.loads(json_string)
return json_data
def extract_board_data(url):
json_data = extract_json_data(url, limit=LIMIT)
pin_list = json_data['pins']
title = json_data['title']
pin_count = json_data['pin_count']
pin_count -= len(pin_list)
while pin_count > 0:
json_data = extract_json_data(url, max=pin_list[-1]['pin_id'],
limit=LIMIT)
pins = json_data['pins']
pin_list += pins
pin_count -= len(pins)
return Board(title, list(map(Pin, pin_list)))
def huaban_download_board(url, output_dir, **kwargs):
kwargs['merge'] = False
board = extract_board_data(url)
output_dir = os.path.join(output_dir, board.title)
print_info(site_info, board.title, 'jpg', float('Inf'))
for pin in board.pins:
download_urls([pin.url], pin.id, pin.ext, float('Inf'),
output_dir=output_dir, faker=True, **kwargs)
def huaban_download(url, output_dir='.', **kwargs):
if re.match(r'http://huaban\.com/boards/\d+/', url):
huaban_download_board(url, output_dir, **kwargs)
else:
print('Only board (画板) pages are supported currently')
print('ex: http://huaban.com/boards/12345678/')
download = huaban_download
download_playlist = playlist_not_supported("huaban")

View File

@ -6,7 +6,7 @@ from ..common import *
def get_mobile_room_url(room_id):
return 'http://www.huomao.com/mobile/mob_live?cid=%s' % room_id
return 'http://www.huomao.com/mobile/mob_live/%s' % room_id
def get_m3u8_url(stream_id):

View File

@ -0,0 +1,396 @@
#!/usr/bin/env python
from ..common import *
from urllib import parse, error
import random
from time import sleep
import datetime
import hashlib
import base64
import logging
import re
from xml.dom.minidom import parseString
__all__ = ['icourses_download', 'icourses_playlist_download']
def icourses_download(url, output_dir='.', **kwargs):
if 'showResDetail.action' in url:
hit = re.search(r'id=(\d+)&courseId=(\d+)', url)
url = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'.format(hit.group(1), hit.group(2))
if re.match(r'http://www.icourses.cn/coursestatic/course_(\d+).html', url):
raise Exception('You can download it with -l flag')
icourses_parser = ICousesExactor(url=url)
icourses_parser.basic_extract()
title = icourses_parser.title
size = None
for i in range(5):
try:
# use this url only for size
size_url = icourses_parser.generate_url(0)
_, type_, size = url_info(size_url, headers=fake_headers)
except error.HTTPError:
logging.warning('Failed to fetch the video file! Retrying...')
sleep(random.Random().randint(2, 5)) # Prevent from blockage
else:
print_info(site_info, title, type_, size)
break
if size is None:
raise Exception("Failed")
if not kwargs['info_only']:
real_url = icourses_parser.update_url(0)
headers = fake_headers.copy()
headers['Referer'] = url
download_urls_icourses(real_url, title, 'flv',total_size=size, output_dir=output_dir, max_size=15728640, dyn_callback=icourses_parser.update_url)
return
def get_course_title(url, course_type, page=None):
if page is None:
try:
# shard course page could be gbk but with charset="utf-8"
page = get_content(url, decoded=False).decode('gbk')
except UnicodeDecodeError:
page = get_content(url, decoded=False).decode('utf8')
if course_type == 'shared_old':
patt = r'<div\s+class="top_left_til">(.+?)<\/div>'
elif course_type == 'shared_new':
patt = r'<h1>(.+?)<\/h1>'
else:
patt = r'<div\s+class="con">(.+?)<\/div>'
return re.search(patt, page).group(1)
def public_course_playlist(url, page=None):
host = 'http://www.icourses.cn/'
patt = r'<a href="(.+?)"\s*title="(.+?)".+?>(?:.|\n)+?</a>'
if page is None:
page = get_content(url)
playlist = re.findall(patt, page)
return [(host+i[0], i[1]) for i in playlist]
def public_course_get_title(url, page=None):
patt = r'<div\s*class="kcslbut">.+?第(\d+)讲'
if page is None:
page = get_content(url)
seq_num = int(re.search(patt, page).group(1)) - 1
course_main_title = get_course_title(url, 'public', page)
return '{}_第{}讲_{}'.format(course_main_title, seq_num+1, public_course_playlist(url, page)[seq_num][1])
def icourses_playlist_download(url, output_dir='.', **kwargs):
page_type_patt = r'showSectionNode\(this,(\d+),(\d+)\)'
resid_courseid_patt = r'changeforvideo\(\'(\d+)\',\'(\d+)\',\'(\d+)\'\)'
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail.action?sectionId={}&courseId={}'
change_for_video_ip = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'
video_list = []
if 'viewVCourse' in url:
playlist = public_course_playlist(url)
for video in playlist:
icourses_download(video[0], output_dir=output_dir, **kwargs)
return
elif 'coursestatic' in url:
course_page = get_content(url)
page_navi_vars = re.search(page_type_patt, course_page)
if page_navi_vars is None: # type 2 shared course
video_list = icourses_playlist_new(url, course_page)
else: # type 1 shared course
sec_page = get_content(ep.format(page_navi_vars.group(2), page_navi_vars.group(1)))
video_list = re.findall(resid_courseid_patt, sec_page)
elif 'viewCharacterDetail.action' in url or 'changeforVideo.action' in url:
page = get_content(url)
video_list = re.findall(resid_courseid_patt, page)
if not video_list:
raise Exception('Unknown url pattern')
for video in video_list:
video_url = change_for_video_ip.format(video[0], video[1])
sleep(random.Random().randint(0, 5)) # Prevent from blockage
icourses_download(video_url, output_dir=output_dir, **kwargs)
def icourses_playlist_new(url, page=None):
# 2 helpers using same interface in the js code
def to_chap(course_id, chap_id, mod):
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail2.action?courseId={}&characId={}&mod={}'
req = post_content(ep.format(course_id, chap_id, mod), post_data={})
return req
def to_sec(course_id, chap_id, mod):
ep = 'http://www.icourses.cn/jpk/viewCharacterDetail2.action?courseId={}&characId={}&mod={}'
req = post_content(ep.format(course_id, chap_id, mod), post_data={})
return req
def show_sec(course_id, chap_id):
ep = 'http://www.icourses.cn/jpk/getSectionNode.action?courseId={}&characId={}&mod=2'
req = post_content(ep.format(course_id, chap_id), post_data={})
return req
if page is None:
page = get_content(url)
chap_patt = r'<h3>.+?id="parent_row_(\d+)".+?onclick="(\w+)\((.+)\)"'
to_chap_patt = r'this,(\d+),(\d+),(\d)'
show_sec_patt = r'this,(\d+),(\d+)'
res_patt = r'res_showResDetail\(\'(\d+)\',\'.+?\',\'\d+\',\'mp4\',\'(\d+)\'\)'
l = re.findall(chap_patt, page)
for i in l:
if i[1] == 'ajaxtocharac':
hit = re.search(to_chap_patt, i[2])
page = to_chap(hit.group(1), hit.group(2), hit.group(3))
hit_list = re.findall(res_patt, page)
if hit_list:
return get_playlist(hit_list[0][0], hit_list[0][1])
for hit in hit_list:
print(hit)
elif i[1] == 'showSectionNode2':
hit = re.search(show_sec_patt, i[2])
page = show_sec(hit.group(1), hit.group(2))
# print(page)
patt = r'ajaxtosection\(this,(\d+),(\d+),(\d+)\)'
hit_list = re.findall(patt, page)
# print(hit_list)
for hit in hit_list:
page = to_sec(hit[0], hit[1], hit[2])
vlist = re.findall(res_patt, page)
if vlist:
return get_playlist(vlist[0][0], vlist[0][1])
raise Exception("No video found in this playlist")
def get_playlist(res_id, course_id):
ep = 'http://www.icourses.cn/jpk/changeforVideo.action?resId={}&courseId={}'
req = get_content(ep.format(res_id, course_id))
patt = r'<a.+?changeforvideo\(\'(\d+)\',\'(\d+)\',\'(\d+)\'\).+?title=\"(.+?)\"'
return re.findall(patt, req)
class ICousesExactor(object):
PLAYER_BASE_VER = '150606-1'
ENCRYPT_MOD_VER = '151020'
ENCRYPT_SALT = '3DAPmXsZ4o' # It took really long time to find this...
def __init__(self, url):
self.url = url
self.title = ''
self.flashvars = ''
self.api_data = {}
self.media_url = ''
self.common_args = {}
self.enc_mode = True
self.page = get_content(self.url)
return
def get_title(self):
if 'viewVCourse' in self.url:
self.title = public_course_get_title(self.url, self.page)
return
title_a_patt = r'<div class="con"> <a.*?>(.*?)</a>'
title_b_patt = r'<div class="con"> <a.*?/a>((.|\n)*?)</div>'
title_a = match1(self.page, title_a_patt).strip()
title_b = match1(self.page, title_b_patt).strip()
title = title_a + title_b
title = re.sub('( +|\n|\t|\r|&nbsp;)', '', unescape_html(title).replace(' ', ''))
self.title = title
def get_flashvars(self):
patt = r'var flashvars\s*=\s*(\{(?:.|\n)+?\});'
hit = re.search(patt, self.page)
if hit is None:
raise Exception('Cannot find flashvars')
flashvar_str = hit.group(1)
uuid = re.search(r'uuid\s*:\s*\"?(\w+)\"?', flashvar_str).group(1)
other = re.search(r'other\s*:\s*"(.*?)"', flashvar_str).group(1)
isvc = re.search(r'IService\s*:\s*\'(.+?)\'', flashvar_str).group(1)
player_time_patt = r'MPlayer.swf\?v\=(\d+)'
player_time = re.search(player_time_patt, self.page).group(1)
self.flashvars = dict(IService=isvc, uuid=uuid, other=other, v=player_time)
def api_req(self, url):
xml_str = get_content(url)
dom = parseString(xml_str)
status = dom.getElementsByTagName('result')[0].getAttribute('status')
if status != 'success':
raise Exception('API returned fail')
api_res = {}
meta = dom.getElementsByTagName('metadata')
for m in meta:
key = m.getAttribute('name')
val = m.firstChild.nodeValue
api_res[key] = val
self.api_data = api_res
def basic_extract(self):
self.get_title()
self.get_flashvars()
api_req_url = '{}?{}'.format(self.flashvars['IService'], parse.urlencode(self.flashvars))
self.api_req(api_req_url)
def do_extract(self, received=0):
self.basic_extract()
return self.generate_url(received)
def update_url(self, received):
args = self.common_args.copy()
play_type = 'seek' if received else 'play'
received = received if received else -1
args['ls'] = play_type
args['start'] = received + 1
args['lt'] = self.get_date_str()
if self.enc_mode:
ssl_ts, sign = self.get_sign(self.media_url)
extra_args = dict(h=sign, r=ssl_ts, p=self.__class__.ENCRYPT_MOD_VER)
args.update(extra_args)
return '{}?{}'.format(self.media_url, parse.urlencode(args))
@classmethod
def get_date_str(self):
fmt_str = '%-m-%-d/%-H:%-M:%-S'
now = datetime.datetime.now()
try:
date_str = now.strftime(fmt_str)
except ValueError: # msvcrt
date_str = '{}-{}/{}:{}:{}'.format(now.month, now.day, now.hour, now.minute, now.second)
return date_str
def generate_url(self, received):
media_host = self.get_media_host(self.api_data['host'])
media_url = media_host + self.api_data['url']
self.media_url = media_url
common_args = dict(lv=self.__class__.PLAYER_BASE_VER)
h = self.api_data.get('h')
r = self.api_data.get('p', self.__class__.ENCRYPT_MOD_VER)
if self.api_data['ssl'] != 'true':
self.enc_mode = False
common_args.update(dict(h=h, r=r))
else:
self.enc_mode = True
common_args['p'] = self.__class__.ENCRYPT_MOD_VER
self.common_args = common_args
return self.update_url(received)
def get_sign(self, media_url):
media_host = parse.urlparse(media_url).netloc
ran = random.randint(0, 9999999)
ssl_callback = get_content('http://{}/ssl/ssl.shtml?r={}'.format(media_host, ran)).split(',')
ssl_ts = int(datetime.datetime.strptime(ssl_callback[1], "%b %d %H:%M:%S %Y").timestamp() + int(ssl_callback[0]))
sign_this = self.__class__.ENCRYPT_SALT + parse.urlparse(media_url).path + str(ssl_ts)
arg_h = base64.b64encode(hashlib.md5(bytes(sign_this, 'utf-8')).digest(), altchars=b'-_')
return ssl_ts, arg_h.decode('utf-8').strip('=')
def get_media_host(self, ori_host):
res = get_content(ori_host + '/ssl/host.shtml').strip()
path = parse.urlparse(ori_host).path
return ''.join([res, path])
def download_urls_icourses(url, title, ext, total_size, output_dir='.', headers=None, **kwargs):
if dry_run or player:
log.wtf('Non standard protocol')
title = get_filename(title)
filename = '%s.%s' % (title, ext)
filepath = os.path.join(output_dir, filename)
if not force and os.path.exists(filepath):
print('Skipping {}: file already exists\n'.format(filepath))
return
bar = SimpleProgressBar(total_size, 1)
print('Downloading %s ...' % tr(filename))
url_save_icourses(url, filepath, bar, total_size, headers=headers, **kwargs)
bar.done()
print()
def url_save_icourses(url, filepath, bar, total_size, dyn_callback=None, is_part=False, max_size=0, headers=None):
def dyn_update_url(received):
if callable(dyn_callback):
logging.debug('Calling callback %s for new URL from %s' % (dyn_callback.__name__, received))
return dyn_callback(received)
if bar is None:
bar = DummyProgressBar()
if os.path.exists(filepath):
if not force:
if not is_part:
bar.done()
print('Skipping %s: file already exists' % tr(os.path.basename(filepath)))
else:
filesize = os.path.getsize(filepath)
bar.update_received(filesize)
return
else:
if not is_part:
bar.done()
print('Overwriting %s' % os.path.basename(filepath), '...')
elif not os.path.exists(os.path.dirname(filepath)):
os.mkdir(os.path.dirname(filepath))
temp_filepath = filepath + '.download'
received = 0
if not force:
open_mode = 'ab'
if os.path.exists(temp_filepath):
tempfile_size = os.path.getsize(temp_filepath)
received += tempfile_size
bar.update_received(tempfile_size)
else:
open_mode = 'wb'
if received:
url = dyn_update_url(received)
if headers is None:
headers = {}
response = urlopen_with_retry(request.Request(url, headers=headers))
# Do not update content-length here.
# Only the 1st segment's content-length is the content-length of the file.
# For other segments, content-length is the standard one, 15 * 1024 * 1024
with open(temp_filepath, open_mode) as output:
before_this_uri = received
# received - before_this_uri is size of the buf we get from one uri
while True:
update_bs = 256 * 1024
left_bytes = total_size - received
to_read = left_bytes if left_bytes <= update_bs else update_bs
# calc the block size to read -- The server can fail to send an EOF
buffer = response.read(to_read)
if not buffer:
logging.debug('Got EOF from server')
break
output.write(buffer)
received += len(buffer)
bar.update_received(len(buffer))
if received >= total_size:
break
if max_size and (received - before_this_uri) >= max_size:
url = dyn_update_url(received)
before_this_uri = received
response = urlopen_with_retry(request.Request(url, headers=headers))
assert received == os.path.getsize(temp_filepath), '%s == %s' % (received, os.path.getsize(temp_filepath))
if os.access(filepath, os.W_OK):
os.remove(filepath) # on Windows rename could fail if destination filepath exists
os.rename(temp_filepath, filepath)
site_info = 'icourses.cn'
download = icourses_download
download_playlist = icourses_playlist_download

View File

@ -21,12 +21,18 @@ def ifeng_download_by_id(id, title = None, output_dir = '.', merge = True, info_
download_urls([url], title, ext, size, output_dir, merge = merge)
def ifeng_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
id = r1(r'/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\.shtml$', url)
# old pattern /uuid.shtml
# now it could be #uuid
id = r1(r'([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', url)
if id:
return ifeng_download_by_id(id, None, output_dir = output_dir, merge = merge, info_only = info_only)
html = get_html(url)
html = get_content(url)
uuid_pattern = r'"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})"'
id = r1(r'var vid="([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})"', html)
if id is None:
video_pattern = r'"vid"\s*:\s*' + uuid_pattern
id = match1(html, video_pattern)
assert id, "can't find video info"
return ifeng_download_by_id(id, None, output_dir = output_dir, merge = merge, info_only = info_only)

View File

@ -52,20 +52,16 @@ class Imgur(VideoExtractor):
else:
# gallery image
content = get_content(self.url)
image = json.loads(match1(content, r'image\s*:\s*({.*}),'))
ext = image['ext']
url = match1(content, r'(https?://i.imgur.com/[^"]+)')
_, container, size = url_info(url)
self.streams = {
'original': {
'src': ['http://i.imgur.com/%s%s' % (image['hash'], ext)],
'size': image['size'],
'container': ext[1:]
},
'thumbnail': {
'src': ['http://i.imgur.com/%ss%s' % (image['hash'], '.jpg')],
'container': 'jpg'
'src': [url],
'size': size,
'container': container
}
}
self.title = image['title']
self.title = r1(r'i\.imgur\.com/([^./]*)', url)
def extract(self, **kwargs):
if 'stream_id' in kwargs and kwargs['stream_id']:

57
src/you_get/extractors/instagram.py Normal file → Executable file
View File

@ -5,24 +5,65 @@ __all__ = ['instagram_download']
from ..common import *
def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
url = r1(r'([^?]*)', url)
html = get_html(url)
vid = r1(r'instagram.com/p/([^/]+)', url)
description = r1(r'<meta property="og:title" content="([^"]*)"', html)
vid = r1(r'instagram.com/\w+/([^/]+)', url)
description = r1(r'<meta property="og:title" content="([^"]*)"', html) or \
r1(r'<title>\s([^<]*)</title>', html) # with logged-in cookies
title = "{} [{}]".format(description.replace("\n", " "), vid)
stream = r1(r'<meta property="og:video" content="([^"]*)"', html)
if stream:
_, ext, size = url_info(stream)
else:
image = r1(r'<meta property="og:image" content="([^"]*)"', html)
ext = 'jpg'
_, _, size = url_info(image)
print_info(site_info, title, ext, size)
url = stream if stream else image
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
download_urls([stream], title, ext, size, output_dir, merge=merge)
else:
data = re.search(r'window\._sharedData\s*=\s*(.*);</script>', html)
if data is not None:
info = json.loads(data.group(1))
post = info['entry_data']['PostPage'][0]
else:
# with logged-in cookies
data = re.search(r'window\.__additionalDataLoaded\(\'[^\']+\',(.*)\);</script>', html)
if data is not None:
log.e('[Error] Cookies needed.')
post = json.loads(data.group(1))
if 'edge_sidecar_to_children' in post['graphql']['shortcode_media']:
edges = post['graphql']['shortcode_media']['edge_sidecar_to_children']['edges']
for edge in edges:
title = edge['node']['shortcode']
image_url = edge['node']['display_url']
if 'video_url' in edge['node']:
image_url = edge['node']['video_url']
ext = image_url.split('?')[0].split('.')[-1]
size = int(get_head(image_url)['Content-Length'])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls=[image_url],
title=title,
ext=ext,
total_size=size,
output_dir=output_dir)
else:
title = post['graphql']['shortcode_media']['shortcode']
image_url = post['graphql']['shortcode_media']['display_url']
if 'video_url' in post['graphql']['shortcode_media']:
image_url = post['graphql']['shortcode_media']['video_url']
ext = image_url.split('?')[0].split('.')[-1]
size = int(get_head(image_url)['Content-Length'])
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls=[image_url],
title=title,
ext=ext,
total_size=size,
output_dir=output_dir)
site_info = "Instagram.com"
download = instagram_download

View File

@ -3,14 +3,18 @@
__all__ = ['iqilu_download']
from ..common import *
import json
def iqilu_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
''''''
if re.match(r'http://v.iqilu.com/\w+', url):
patt = r'url\s*:\s*\[([^\]]+)\]'
#URL in webpage
html = get_content(url)
url = match1(html, r"<input type='hidden' id='playerId' url='(.+)'")
player_data = '[' + match1(html, patt) + ']'
urls = json.loads(player_data)
url = urls[0]['stream_url']
#grab title
title = match1(html, r'<meta name="description" content="(.*?)\"\W')

View File

@ -20,7 +20,7 @@ Changelog:
use @fffonion 's method in #617.
Add trace AVM(asasm) code in Iqiyi's encode function where the salt is put into the encode array and reassemble by RABCDasm(or WinRABCDasm),then use Fiddler to response modified file to replace the src file with its AutoResponder function ,set browser Fiddler proxy and play with !debug version! Flash Player ,finially get result in flashlog.txt(its location can be easily found in search engine).
Code Like (without letters after #comment:),it just do the job : trace("{IQIYI_SALT}:"+salt_array.join(""))
```(Postion After getTimer)
```(Position After getTimer)
findpropstrict QName(PackageNamespace(""), "trace")
pushstring "{IQIYI_SALT}:" #comment for you to locate the salt
getscopeobject 1
@ -97,7 +97,9 @@ class Iqiyi(VideoExtractor):
{'id': '4k', 'container': 'm3u8', 'video_profile': '4k'},
{'id': 'BD', 'container': 'm3u8', 'video_profile': '1080p'},
{'id': 'TD', 'container': 'm3u8', 'video_profile': '720p'},
{'id': 'TD_H265', 'container': 'm3u8', 'video_profile': '720p H265'},
{'id': 'HD', 'container': 'm3u8', 'video_profile': '540p'},
{'id': 'HD_H265', 'container': 'm3u8', 'video_profile': '540p H265'},
{'id': 'SD', 'container': 'm3u8', 'video_profile': '360p'},
{'id': 'LD', 'container': 'm3u8', 'video_profile': '210p'},
]
@ -108,8 +110,8 @@ class Iqiyi(VideoExtractor):
stream_to_bid = { '4k': 10, 'fullhd' : 5, 'suprt-high' : 4, 'super' : 3, 'high' : 2, 'standard' :1, 'topspeed' :96}
'''
ids = ['4k','BD', 'TD', 'HD', 'SD', 'LD']
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD', 2: 'HD', 4: 'TD', 17: 'TD', 96: 'LD', 1: 'SD'}
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p'}
vd_2_id = {10: '4k', 19: '4k', 5:'BD', 18: 'BD', 21: 'HD_H265', 2: 'HD', 4: 'TD', 17: 'TD_H265', 96: 'LD', 1: 'SD', 14: 'TD'}
id_2_profile = {'4k':'4k', 'BD': '1080p','TD': '720p', 'HD': '540p', 'SD': '360p', 'LD': '210p', 'HD_H265': '540p H265', 'TD_H265': '720p H265'}
@ -117,10 +119,10 @@ class Iqiyi(VideoExtractor):
self.url = url
video_page = get_content(url)
videos = set(re.findall(r'<a href="(http://www\.iqiyi\.com/v_[^"]+)"', video_page))
videos = set(re.findall(r'<a href="(?=https?:)?(//www\.iqiyi\.com/v_[^"]+)"', video_page))
for video in videos:
self.__class__().download_by_url(video, **kwargs)
self.__class__().download_by_url('https:' + video, **kwargs)
def prepare(self, **kwargs):
assert self.url or self.vid
@ -129,15 +131,17 @@ class Iqiyi(VideoExtractor):
html = get_html(self.url)
tvid = r1(r'#curid=(.+)_', self.url) or \
r1(r'tvid=([^&]+)', self.url) or \
r1(r'data-player-tvid="([^"]+)"', html)
r1(r'data-player-tvid="([^"]+)"', html) or r1(r'tv(?:i|I)d=(.+?)\&', html) or r1(r'param\[\'tvid\'\]\s*=\s*"(.+?)"', html)
videoid = r1(r'#curid=.+_(.*)$', self.url) or \
r1(r'vid=([^&]+)', self.url) or \
r1(r'data-player-videoid="([^"]+)"', html)
r1(r'data-player-videoid="([^"]+)"', html) or r1(r'vid=(.+?)\&', html) or r1(r'param\[\'vid\'\]\s*=\s*"(.+?)"', html)
self.vid = (tvid, videoid)
self.title = match1(html, '<title>([^<]+)').split('-')[0]
info_u = 'http://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=' + tvid
json_res = get_content(info_u)
self.title = json.loads(json_res)['data']['vn']
tvid, videoid = self.vid
info = getVMS(tvid, videoid)
assert info['code'] == 'A00000', 'can\'t play this video'
assert info['code'] == 'A00000', "can't play this video"
for stream in info['data']['vidl']:
try:
@ -145,8 +149,8 @@ class Iqiyi(VideoExtractor):
if stream_id in self.stream_types:
continue
stream_profile = self.id_2_profile[stream_id]
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0}
except:
self.streams[stream_id] = {'video_profile': stream_profile, 'container': 'm3u8', 'src': [stream['m3u']], 'size' : 0, 'm3u8_url': stream['m3u']}
except Exception as e:
log.i("vd: {} is not handled".format(stream['vd']))
log.i("info is {}".format(stream))
@ -199,9 +203,7 @@ class Iqiyi(VideoExtractor):
# For legacy main()
#Here's the change!!
download_url_ffmpeg(urls[0], self.title, 'mp4',
output_dir=kwargs['output_dir'],
merge=kwargs['merge'],)
download_url_ffmpeg(urls[0], self.title, 'mp4', output_dir=kwargs['output_dir'], merge=kwargs['merge'], stream=False)
if not kwargs['caption']:
print('Skipping captions.')

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python
__all__ = ['iwara_download']
from ..common import *
headers = {
'DNT': '1',
'Accept-Encoding': 'gzip, deflate, sdch, br',
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Save-Data': 'on',
'Cookie':'has_js=1;show_adult=1',
}
stream_types = [
{'id': 'Source', 'container': 'mp4', 'video_profile': '原始'},
{'id': '540p', 'container': 'mp4', 'video_profile': '540p'},
{'id': '360p', 'container': 'mp4', 'video_profile': '360P'},
]
def iwara_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
global headers
video_hash = match1(url, r'https?://\w+.iwara.tv/videos/(\w+)')
video_url = match1(url, r'(https?://\w+.iwara.tv)/videos/\w+')
html = get_content(url, headers=headers)
title = r1(r'<title>(.*)</title>', html)
api_url = video_url + '/api/video/' + video_hash
content = get_content(api_url, headers=headers)
data = json.loads(content)
down_urls = 'https:' + data[0]['uri']
type, ext, size = url_info(down_urls, headers=headers)
print_info(site_info, title+data[0]['resolution'], type, size)
if not info_only:
download_urls([down_urls], title, ext, size, output_dir, merge=merge, headers=headers)
def download_playlist_by_url( url, **kwargs):
video_page = get_content(url)
# url_first=re.findall(r"(http[s]?://[^/]+)",url)
url_first=match1(url, r"(http[s]?://[^/]+)")
# print (url_first)
videos = set(re.findall(r'<a href="(/videos/[^"]+)"', video_page))
if(len(videos)>0):
for video in videos:
iwara_download(url_first+video, **kwargs)
else:
maybe_print('this page not found any videos')
site_info = "Iwara"
download = iwara_download
download_playlist = download_playlist_by_url

View File

@ -0,0 +1,157 @@
#!/usr/bin/env python
import base64
import binascii
from ..common import *
import random
import string
import ctypes
from json import loads
from urllib import request
__all__ = ['ixigua_download', 'ixigua_download_playlist_by_url']
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 "
"Safari/537.36",
}
def int_overflow(val):
maxint = 2147483647
if not -maxint - 1 <= val <= maxint:
val = (val + (maxint + 1)) % (2 * (maxint + 1)) - maxint - 1
return val
def unsigned_right_shitf(n, i):
if n < 0:
n = ctypes.c_uint32(n).value
if i < 0:
return -int_overflow(n << abs(i))
return int_overflow(n >> i)
def get_video_url_from_video_id(video_id):
"""Splicing URLs according to video ID to get video details"""
# from js
data = [""] * 256
for index, _ in enumerate(data):
t = index
for i in range(8):
t = -306674912 ^ unsigned_right_shitf(t, 1) if 1 & t else unsigned_right_shitf(t, 1)
data[index] = t
def tmp():
rand_num = random.random()
path = "/video/urls/v/1/toutiao/mp4/{video_id}?r={random_num}".format(video_id=video_id,
random_num=str(rand_num)[2:])
e = o = r = -1
i, a = 0, len(path)
while i < a:
e = ord(path[i])
i += 1
if e < 128:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ e)]
else:
if e < 2048:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (192 | e >> 6 & 31))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
else:
if 55296 <= e < 57344:
e = (1023 & e) + 64
i += 1
o = 1023 & t.url(i)
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (240 | e >> 8 & 7))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 2 & 63))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | o >> 6 & 15 | (3 & e) << 4))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & o))]
else:
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (224 | e >> 12 & 15))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 6 & 63))]
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
return "https://ib.365yg.com{path}&s={param}".format(path=path, param=unsigned_right_shitf(r ^ -1, 0))
while 1:
url = tmp()
if url.split("=")[-1][0] != "-": # 参数s不能为负数
return url
def ixigua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
# example url: https://www.ixigua.com/i6631065141750268420/#mid=63024814422
resp = urlopen_with_retry(request.Request(url))
html = resp.read().decode('utf-8')
_cookies = []
for c in resp.getheader('Set-Cookie').split("httponly,"):
_cookies.append(c.strip().split(' ')[0])
headers['cookie'] = ' '.join(_cookies)
conf = loads(match1(html, r"window\.config = (.+);"))
if not conf:
log.e("Get window.config from url failed, url: {}".format(url))
return
verify_url = conf['prefix'] + conf['url'] + '?key=' + conf['key'] + '&psm=' + conf['psm'] \
+ '&_signature=' + ''.join(random.sample(string.ascii_letters + string.digits, 31))
try:
ok = get_content(verify_url)
except Exception as e:
ok = e.msg
if ok != 'OK':
log.e("Verify failed, verify_url: {}, result: {}".format(verify_url, ok))
return
html = get_content(url, headers=headers)
video_id = match1(html, r"\"vid\":\"([^\"]+)")
title = match1(html, r"\"player__videoTitle\">.*?<h1.*?>(.*)<\/h1><\/div>")
if not video_id:
log.e("video_id not found, url:{}".format(url))
return
video_info_url = get_video_url_from_video_id(video_id)
video_info = loads(get_content(video_info_url))
if video_info.get("code", 1) != 0:
log.e("Get video info from {} error: server return code {}".format(video_info_url, video_info.get("code", 1)))
return
if not video_info.get("data", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data or data is empty".format(video_info_url))
return
if not video_info["data"].get("video_list", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data.video_list or data.video_list is empty".format(video_info_url))
return
if not video_info["data"]["video_list"].get("video_1", None):
log.e("Get video info from {} error: The server returns JSON value"
" without data.video_list.video_1 or data.video_list.video_1 is empty".format(video_info_url))
return
bestQualityVideo = list(video_info["data"]["video_list"].keys())[-1] #There is not only video_1, there might be video_2
size = int(video_info["data"]["video_list"][bestQualityVideo]["size"])
print_info(site_info=site_info, title=title, type="mp4", size=size) # 该网站只有mp4类型文件
if not info_only:
video_url = base64.b64decode(video_info["data"]["video_list"][bestQualityVideo]["main_url"].encode("utf-8"))
download_urls([video_url.decode("utf-8")], title, "mp4", size, output_dir, merge=merge, headers=headers, **kwargs)
def ixigua_download_playlist_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
assert "user" in url, "Only support users to publish video list,Please provide a similar url:" \
"https://www.ixigua.com/c/user/6907091136/"
user_id = url.split("/")[-2] if url[-1] == "/" else url.split("/")[-1]
params = {"max_behot_time": "0", "max_repin_time": "0", "count": "20", "page_type": "0", "user_id": user_id}
while 1:
url = "https://www.ixigua.com/c/user/article/?" + "&".join(["{}={}".format(k, v) for k, v in params.items()])
video_list = loads(get_content(url, headers=headers))
params["max_behot_time"] = video_list["next"]["max_behot_time"]
for video in video_list["data"]:
ixigua_download("https://www.ixigua.com/i{}/".format(video["item_id"]), output_dir, merge, info_only,
**kwargs)
if video_list["next"]["max_behot_time"] == 0:
break
site_info = "ixigua.com"
download = ixigua_download
download_playlist = ixigua_download_playlist_by_url

View File

@ -1,23 +0,0 @@
#!/usr/bin/env python
__all__ = ['jpopsuki_download']
from ..common import *
def jpopsuki_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url, faker=True)
title = r1(r'<meta name="title" content="([^"]*)"', html)
if title.endswith(' - JPopsuki TV'):
title = title[:-14]
url = "http://jpopsuki.tv%s" % r1(r'<source src="([^"]*)"', html)
type, ext, size = url_info(url, faker=True)
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
site_info = "JPopsuki.tv"
download = jpopsuki_download
download_playlist = playlist_not_supported('jpopsuki')

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python
from ..common import *
from .universal import *
__all__ = ['kakao_download']
def kakao_download(url, output_dir='.', info_only=False, **kwargs):
json_request_url = 'https://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?vid={}'
# in this implementation playlist not supported so use url_without_playlist
# if want to support playlist need to change that
if re.search('playlistId', url):
url = re.search(r"(.+)\?.+?", url).group(1)
page = get_content(url)
try:
vid = re.search(r"<meta name=\"vid\" content=\"(.+)\">", page).group(1)
title = re.search(r"<meta name=\"title\" content=\"(.+)\">", page).group(1)
meta_str = get_content(json_request_url.format(vid))
meta_json = json.loads(meta_str)
standard_preset = meta_json['output_list']['standard_preset']
output_videos = meta_json['output_list']['output_list']
size = ''
if meta_json['svcname'] == 'smr_pip':
for v in output_videos:
if v['preset'] == 'mp4_PIP_SMR_480P':
size = int(v['filesize'])
break
else:
for v in output_videos:
if v['preset'] == standard_preset:
size = int(v['filesize'])
break
video_url = meta_json['location']['url']
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls([video_url], title, 'mp4', size, output_dir, **kwargs)
except:
universal_download(url, output_dir, merge=kwargs['merge'], info_only=info_only, **kwargs)
site_info = "tv.kakao.com"
download = kakao_download
download_playlist = playlist_not_supported('kakao')

View File

@ -14,7 +14,7 @@ def ku6_download_by_id(id, title = None, output_dir = '.', merge = True, info_on
title = title or t
assert title
urls = f.split(',')
ext = re.sub(r'.*\.', '', urls[0])
ext = match1(urls[0], r'.*\.(\w+)\??[^\.]*')
assert ext in ('flv', 'mp4', 'f4v'), ext
ext = {'f4v': 'flv'}.get(ext, ext)
size = 0
@ -37,6 +37,30 @@ def ku6_download(url, output_dir = '.', merge = True, info_only = False, **kwarg
r'http://my.ku6.com/watch\?.*v=(.*)\.\..*']
id = r1_of(patterns, url)
if id is None:
# http://www.ku6.com/2017/detail-zt.html?vid=xvqTmvZrH8MNvErpvRxFn3
page = get_content(url)
meta = re.search(r'detailDataMap=(\{.+?\});', page)
if meta is not None:
meta = meta.group(1)
else:
raise Exception('Unsupported url')
vid = re.search(r'vid=([^&]+)', url)
if vid is not None:
vid = vid.group(1)
else:
raise Exception('Unsupported url')
this_meta = re.search('"?'+vid+'"?:\{(.+?)\}', meta)
if this_meta is not None:
this_meta = this_meta.group(1)
title = re.search('title:"(.+?)"', this_meta).group(1)
video_url = re.search('playUrl:"(.+?)"', this_meta).group(1)
video_size = url_size(video_url)
print_info(site_info, title, 'mp4', video_size)
if not info_only:
download_urls([video_url], title, 'mp4', video_size, output_dir, merge=merge, **kwargs)
return
ku6_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
def baidu_ku6(url):
@ -48,6 +72,10 @@ def baidu_ku6(url):
if isrc is not None:
h2 = get_html(isrc)
id = match1(h2, r'http://v.ku6.com/show/(.*)\.\.\.html')
#fix #1746
#some ku6 urls really ends with three dots? A bug?
if id is None:
id = match1(h2, r'http://v.ku6.com/show/(.*)\.html')
return id

View File

@ -0,0 +1,42 @@
#!/usr/bin/env python
import urllib.request
import urllib.parse
import json
import re
from ..util import log
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
__all__ = ['kuaishou_download_by_url']
def kuaishou_download_by_url(url, info_only=False, **kwargs):
page = get_content(url)
# size = video_list[-1]['size']
# result wrong size
try:
search_result=re.search(r"\"playUrls\":\[(\{\"quality\"\:\"\w+\",\"url\":\".*?\"\})+\]", page)
all_video_info_str = search_result.group(1)
all_video_infos=re.findall(r"\{\"quality\"\:\"(\w+)\",\"url\":\"(.*?)\"\}", all_video_info_str)
# get the one of the best quality
video_url = all_video_infos[0][1].encode("utf-8").decode('unicode-escape')
title = re.search(r"<meta charset=UTF-8><title>(.*?)</title>", page).group(1)
size = url_size(video_url)
video_format = "flv"#video_url.split('.')[-1]
print_info(site_info, title, video_format, size)
if not info_only:
download_urls([video_url], title, video_format, size, **kwargs)
except:# extract image
og_image_url = re.search(r"<meta\s+property=\"og:image\"\s+content=\"(.+?)\"/>", page).group(1)
image_url = og_image_url
title = url.split('/')[-1]
size = url_size(image_url)
image_format = image_url.split('.')[-1]
print_info(site_info, title, image_format, size)
if not info_only:
download_urls([image_url], title, image_format, size, **kwargs)
site_info = "kuaishou.com"
download = kuaishou_download_by_url
download_playlist = playlist_not_supported('kuaishou')

View File

@ -8,6 +8,7 @@ from base64 import b64decode
import re
import hashlib
def kugou_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
if url.lower().find("5sing") != -1:
# for 5sing.kugou.com
@ -20,31 +21,72 @@ def kugou_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
print_info(site_info, title, songtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
elif url.lower().find("hash") != -1:
return kugou_download_by_hash(url, output_dir, merge, info_only)
else:
# for the www.kugou.com/
return kugou_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
# raise NotImplementedError(url)
def kugou_download_by_hash(title,hash_val,output_dir = '.', merge = True, info_only = False):
def kugou_download_by_hash(url, output_dir='.', merge=True, info_only=False):
# sample
#url_sample:http://www.kugou.com/yy/album/single/536957.html
#hash ->key md5(hash+kgcloud")->key decompile swf
#cmd 4 for mp3 cmd 3 for m4a
key=hashlib.new('md5',(hash_val+"kgcloud").encode("utf-8")).hexdigest()
html=get_html("http://trackercdn.kugou.com/i/?pid=6&key=%s&acceptMp3=1&cmd=4&hash=%s"%(key,hash_val))
# url_sample:http://www.kugou.com/song/#hash=93F7D2FC6E95424739448218B591AEAF&album_id=9019462
hash_val = match1(url, 'hash=(\w+)')
album_id = match1(url, 'album_id=(\d+)')
if not album_id:
album_id = 123
html = get_html("http://www.kugou.com/yy/index.php?r=play/getdata&hash={}&album_id={}&mid=123".format(hash_val, album_id))
j = loads(html)
url=j['url']
url = j['data']['play_url']
title = j['data']['audio_name']
# some songs cann't play because of copyright protection
if (url == ''):
return
songtype, ext, size = url_info(url)
print_info(site_info, title, songtype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
def kugou_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
urls = []
# download music leaderboard
# sample: http://www.kugou.com/yy/html/rank.html
if url.lower().find('rank') != -1:
html = get_html(url)
pattern=re.compile('title="(.*?)".* data="(\w*)\|.*?"')
pairs=pattern.findall(html)
for title,hash_val in pairs:
kugou_download_by_hash(title,hash_val,output_dir,merge,info_only)
pattern = re.compile('<a href="(http://.*?)" data-active=')
res = pattern.findall(html)
for song in res:
res = get_html(song)
pattern_url = re.compile('"hash":"(\w+)".*"album_id":(\d)+')
hash_val, album_id = res = pattern_url.findall(res)[0]
if not album_id:
album_id = 123
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (hash_val, album_id))
# download album
# album sample: http://www.kugou.com/yy/album/single/1645030.html
elif url.lower().find('album') != -1:
html = get_html(url)
pattern = re.compile('var data=(\[.*?\]);')
res = pattern.findall(html)[0]
for v in json.loads(res):
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v['hash'], v['album_id']))
# download the playlist
# playlist sample:http://www.kugou.com/yy/special/single/487279.html
else:
html = get_html(url)
pattern = re.compile('data="(\w+)\|(\d+)"')
for v in pattern.findall(html):
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
print('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
# download the list by hash
for url in urls:
kugou_download_by_hash(url, output_dir, merge, info_only)
site_info = "kugou.com"

View File

@ -2,19 +2,22 @@
__all__ = ['letv_download', 'letvcloud_download', 'letvcloud_download_by_vu']
import json
import base64
import hashlib
import random
import xml.etree.ElementTree as ET
import base64, hashlib, urllib, time, re
import urllib
from ..common import *
# @DEPRECATED
def get_timestamp():
tn = random.random()
url = 'http://api.letv.com/time?tn={}'.format(tn)
result = get_content(url)
return json.loads(result)['stime']
# @DEPRECATED
def get_key(t):
for s in range(0, 8):
@ -24,9 +27,12 @@ def get_key(t):
t += e
return t ^ 185025305
def calcTimeKey(t):
ror = lambda val, r_bits,: ((val & (2 ** 32 - 1)) >> r_bits % 32) | (val << (32 - (r_bits % 32)) & (2 ** 32 - 1))
return ror(ror(t,773625421%13)^773625421,773625421%17)
magic = 185025305
return ror(t, magic % 17) ^ magic
# return ror(ror(t,773625421%13)^773625421,773625421%17)
def decode(data):
@ -46,25 +52,20 @@ def decode(data):
return ''.join([chr(i) for i in loc7])
else:
# directly return
return data
return str(data)
def video_info(vid, **kwargs):
url = 'http://api.letv.com/mms/out/video/playJson?id={}&platid=1&splatid=101&format=1&tkey={}&domain=www.letv.com'.format(vid,calcTimeKey(int(time.time())))
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=105&format=1&tkey={}&domain=www.le.com&region=cn&source=1000&accesyx=1'.format(vid, calcTimeKey(int(time.time())))
r = get_content(url, decoded=False)
info = json.loads(str(r, "utf-8"))
info = info['msgs']
stream_id = None
support_stream_id = info["playurl"]["dispatch"].keys()
if "stream_id" in kwargs and kwargs["stream_id"].lower() in support_stream_id:
stream_id = kwargs["stream_id"]
else:
print("Current Video Supports:")
for i in support_stream_id:
print("\t--format",i,"<URL>")
if "1080p" in support_stream_id:
stream_id = '1080p'
elif "720p" in support_stream_id:
@ -73,19 +74,23 @@ def video_info(vid,**kwargs):
stream_id = sorted(support_stream_id, key=lambda i: int(i[1:]))[-1]
url = info["playurl"]["domain"][0] + info["playurl"]["dispatch"][stream_id][0]
uuid = hashlib.sha1(url.encode('utf8')).hexdigest() + '_0'
ext = info["playurl"]["dispatch"][stream_id][1].split('.')[-1]
url+="&ctv=pc&m3v=1&termid=1&format=1&hwtype=un&ostype=Linux&tag=letv&sign=letv&expect=3&tn={}&pay=0&iscpn=f9051&rateid={}".format(random.random(),stream_id)
url = url.replace('tss=0', 'tss=ios')
url += "&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
r2 = get_content(url, decoded=False)
info2 = json.loads(str(r2, "utf-8"))
# hold on ! more things to do
# to decode m3u8 (encoded)
m3u8 = get_content(info2["location"],decoded=False)
suffix = '&r=' + str(int(time.time() * 1000)) + '&appid=500'
m3u8 = get_content(info2["location"] + suffix, decoded=False)
m3u8_list = decode(m3u8)
urls = re.findall(r'^[^#][^\r]*',m3u8_list,re.MULTILINE)
urls = re.findall(r'(http.*?)#', m3u8_list, re.MULTILINE)
return ext, urls
def letv_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
ext, urls = video_info(vid, **kwargs)
size = 0
@ -97,6 +102,7 @@ def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,
if not info_only:
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
def letvcloud_download_by_vu(vu, uu, title=None, output_dir='.', merge=True, info_only=False):
# ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
# str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
@ -118,6 +124,7 @@ def letvcloud_download_by_vu(vu, uu, title=None, output_dir='.', merge=True, inf
if not info_only:
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
qs = parse.urlparse(url).query
vu = match1(qs, r'vu=([\w]+)')
@ -125,9 +132,16 @@ def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
title = "LETV-%s" % vu
letvcloud_download_by_vu(vu, uu, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
def letv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
url = url_locations([url])[0]
if re.match(r'http://yuntv.letv.com/', url):
letvcloud_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
elif 'sports.le.com' in url:
html = get_content(url)
vid = match1(url, r'video/(\d+)\.html')
title = match1(html, r'<h2 class="title">([^<]+)</h2>')
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
else:
html = get_content(url)
vid = match1(url, r'http://www.letv.com/ptv/vplay/(\d+).html') or \
@ -136,6 +150,7 @@ def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
title = match1(html, r'name="irTitle" content="(.*?)"')
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "Le.com"
download = letv_download
download_playlist = playlist_not_supported('letv')

View File

@ -2,39 +2,66 @@
__all__ = ['lizhi_download']
import json
import datetime
from ..common import *
def lizhi_download_playlist(url, output_dir = '.', merge = True, info_only = False, **kwargs):
# like this http://www.lizhi.fm/#/31365/
#api desc: s->start l->length band->some radio
#http://www.lizhi.fm/api/radio_audios?s=0&l=100&band=31365
band_id = match1(url,r'#/(\d+)')
#try to get a considerable large l to reduce html parsing task.
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band='+band_id
content_json = json.loads(get_content(api_url))
for sound in content_json:
title = sound["name"]
res_url = sound["url"]
songtype, ext, size = url_info(res_url,faker=True)
print_info(site_info, title, songtype, size)
if not info_only:
#no referer no speed!
download_urls([res_url], title, ext, size, output_dir, merge=merge ,refer = 'http://www.lizhi.fm',faker=True)
pass
#
# Worked well but not perfect.
# TODO: add option --format={sd|hd}
#
def get_url(ep):
readable = datetime.datetime.fromtimestamp(int(ep['create_time']) / 1000).strftime('%Y/%m/%d')
return 'http://cdn5.lizhi.fm/audio/{}/{}_hd.mp3'.format(readable, ep['id'])
def lizhi_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
# url like http://www.lizhi.fm/#/549759/18864883431656710
api_id = match1(url,r'#/(\d+/\d+)')
api_url = 'http://www.lizhi.fm/api/audio/'+api_id
content_json = json.loads(get_content(api_url))
title = content_json["audio"]["name"]
res_url = content_json["audio"]["url"]
songtype, ext, size = url_info(res_url,faker=True)
print_info(site_info, title, songtype, size)
if not info_only:
#no referer no speed!
download_urls([res_url], title, ext, size, output_dir, merge=merge ,refer = 'http://www.lizhi.fm',faker=True)
# radio_id: e.g. 549759 from http://www.lizhi.fm/549759/
#
# Returns a list of tuples (audio_id, title, url) for each episode
# (audio) in the radio playlist. url is the direct link to the audio
# file.
def lizhi_extract_playlist_info(radio_id):
# /api/radio_audios API parameters:
#
# - s: starting episode
# - l: count (per page)
# - band: radio_id
#
# We use l=65535 for poor man's pagination (that is, no pagination
# at all -- hope all fits on a single page).
#
# TODO: Use /api/radio?band={radio_id} to get number of episodes
# (au_cnt), then handle pagination properly.
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band=%s' % radio_id
api_response = json.loads(get_content(api_url))
return [(ep['id'], ep['name'], get_url(ep)) for ep in api_response]
def lizhi_download_audio(audio_id, title, url, output_dir='.', info_only=False):
filetype, ext, size = url_info(url)
print_info(site_info, title, filetype, size)
if not info_only:
download_urls([url], title, ext, size, output_dir=output_dir)
def lizhi_download_playlist(url, output_dir='.', info_only=False, **kwargs):
# Sample URL: http://www.lizhi.fm/549759/
radio_id = match1(url,r'/(\d+)')
if not radio_id:
raise NotImplementedError('%s not supported' % url)
for audio_id, title, url in lizhi_extract_playlist_info(radio_id):
lizhi_download_audio(audio_id, title, url, output_dir=output_dir, info_only=info_only)
def lizhi_download(url, output_dir='.', info_only=False, **kwargs):
# Sample URL: http://www.lizhi.fm/549759/18864883431656710/
m = re.search(r'/(?P<radio_id>\d+)/(?P<audio_id>\d+)', url)
if not m:
raise NotImplementedError('%s not supported' % url)
radio_id = m.group('radio_id')
audio_id = m.group('audio_id')
# Look for the audio_id among the full list of episodes
for aid, title, url in lizhi_extract_playlist_info(radio_id):
if aid == audio_id:
lizhi_download_audio(audio_id, title, url, output_dir=output_dir, info_only=info_only)
break
else:
raise NotImplementedError('Audio #%s not found in playlist #%s' % (audio_id, radio_id))
site_info = "lizhi.fm"
download = lizhi_download

View File

@ -0,0 +1,74 @@
#!/usr/bin/env python
__all__ = ['longzhu_download']
import json
from ..common import (
get_content,
general_m3u8_extractor,
match1,
print_info,
download_urls,
playlist_not_supported,
)
from ..common import player
def longzhu_download(url, output_dir = '.', merge=True, info_only=False, **kwargs):
web_domain = url.split('/')[2]
if (web_domain == 'star.longzhu.com') or (web_domain == 'y.longzhu.com'):
domain = url.split('/')[3].split('?')[0]
m_url = 'http://m.longzhu.com/{0}'.format(domain)
m_html = get_content(m_url)
room_id_patt = r'var\s*roomId\s*=\s*(\d+);'
room_id = match1(m_html,room_id_patt)
json_url = 'http://liveapi.plu.cn/liveapp/roomstatus?roomId={0}'.format(room_id)
content = get_content(json_url)
data = json.loads(content)
streamUri = data['streamUri']
if len(streamUri) <= 4:
raise ValueError('The live stream is not online!')
title = data['title']
streamer = data['userName']
title = str.format(streamer,': ',title)
steam_api_url = 'http://livestream.plu.cn/live/getlivePlayurl?roomId={0}'.format(room_id)
content = get_content(steam_api_url)
data = json.loads(content)
isonline = data.get('isTransfer')
if isonline == '0':
raise ValueError('The live stream is not online!')
real_url = data['playLines'][0]['urls'][0]['securityUrl']
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge=merge)
elif web_domain == 'replay.longzhu.com':
videoid = match1(url, r'(\d+)$')
json_url = 'http://liveapi.longzhu.com/livereplay/getreplayfordisplay?videoId={0}'.format(videoid)
content = get_content(json_url)
data = json.loads(content)
username = data['userName']
title = data['title']
title = str.format(username,':',title)
real_url = data['videoUrl']
if player:
print_info('Longzhu Video', title, 'm3u8', 0)
download_urls([real_url], title, 'm3u8', 0, output_dir, merge=merge)
else:
urls = general_m3u8_extractor(real_url)
print_info('Longzhu Video', title, 'm3u8', 0)
if not info_only:
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
else:
raise ValueError('Wrong url or unsupported link ... {0}'.format(url))
site_info = 'longzhu.com'
download = longzhu_download
download_playlist = playlist_not_supported('longzhu')

View File

@ -3,15 +3,19 @@
__all__ = ['magisto_download']
from ..common import *
import json
def magisto_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
title1 = r1(r'<meta name="twitter:title" content="([^"]*)"', html)
title2 = r1(r'<meta name="twitter:description" content="([^"]*)"', html)
video_hash = r1(r'http://www.magisto.com/video/([^/]+)', url)
title = "%s %s - %s" % (title1, title2, video_hash)
url = r1(r'<source type="[^"]+" src="([^"]*)"', html)
video_hash = r1(r'video\/([a-zA-Z0-9]+)', url)
api_url = 'https://www.magisto.com/api/video/{}'.format(video_hash)
content = get_html(api_url)
data = json.loads(content)
title1 = data['title']
title2 = data['creator']
title = "%s - %s" % (title1, title2)
url = data['video_direct_url']
type, ext, size = url_info(url)
print_info(site_info, title, type, size)

View File

@ -12,22 +12,25 @@ import re
class MGTV(VideoExtractor):
name = "芒果 (MGTV)"
# Last updated: 2015-11-24
# Last updated: 2016-11-13
stream_types = [
{'id': 'hd', 'container': 'flv', 'video_profile': '超清'},
{'id': 'sd', 'container': 'flv', 'video_profile': '高清'},
{'id': 'ld', 'container': 'flv', 'video_profile': '标清'},
{'id': 'hd', 'container': 'ts', 'video_profile': '超清'},
{'id': 'sd', 'container': 'ts', 'video_profile': '高清'},
{'id': 'ld', 'container': 'ts', 'video_profile': '标清'},
]
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
api_endpoint = 'http://v.api.mgtv.com/player/video?video_id={video_id}'
api_endpoint = 'http://pcweb.api.mgtv.com/player/video?video_id={video_id}'
@staticmethod
def get_vid_from_url(url):
"""Extracts video ID from URL.
"""
return match1(url, 'http://www.mgtv.com/v/\d/\d+/\w+/(\d+).html')
vid = match1(url, 'https?://www.mgtv.com/(?:b|l)/\d+/(\d+).html')
if not vid:
vid = match1(url, 'https?://www.mgtv.com/hz/bdpz/\d+/(\d+).html')
return vid
#----------------------------------------------------------------------
@staticmethod
@ -44,10 +47,15 @@ class MGTV(VideoExtractor):
content = get_content(content['info']) #get the REAL M3U url, maybe to be changed later?
segment_list = []
segments_size = 0
for i in content.split():
if not i.startswith('#'): #not the best way, better we use the m3u8 package
segment_list.append(base_url + i)
return segment_list
# use ext-info for fast size calculate
elif i.startswith('#EXT-MGTV-File-SIZE:'):
segments_size += int(i[i.rfind(':')+1:])
return m3u_url, segments_size, segment_list
def download_playlist_by_url(self, url, **kwargs):
pass
@ -58,8 +66,9 @@ class MGTV(VideoExtractor):
content = get_content(self.api_endpoint.format(video_id = self.vid))
content = loads(content)
self.title = content['data']['info']['title']
domain = content['data']['stream_domain'][0]
#stream_avalable = [i['name'] for i in content['data']['stream']]
#stream_available = [i['name'] for i in content['data']['stream']]
stream_available = {}
for i in content['data']['stream']:
stream_available[i['name']] = i['url']
@ -68,15 +77,11 @@ class MGTV(VideoExtractor):
if s['video_profile'] in stream_available.keys():
quality_id = self.id_dic[s['video_profile']]
url = stream_available[s['video_profile']]
url = re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
segment_list_this = self.get_mgtv_real_url(url)
url = domain + re.sub( r'(\&arange\=\d+)', '', url) #Un-Hum
m3u8_url, m3u8_size, segment_list_this = self.get_mgtv_real_url(url)
container_this_stream = ''
size_this_stream = 0
stream_fileid_list = []
for i in segment_list_this:
_, container_this_stream, size_this_seg = url_info(i)
size_this_stream += size_this_seg
stream_fileid_list.append(os.path.basename(i).split('.')[0])
#make pieces
@ -85,10 +90,11 @@ class MGTV(VideoExtractor):
pieces.append({'fileid': i[0], 'segs': i[1],})
self.streams[quality_id] = {
'container': 'flv',
'container': s['container'],
'video_profile': s['video_profile'],
'size': size_this_stream,
'pieces': pieces
'size': m3u8_size,
'pieces': pieces,
'm3u8_url': m3u8_url
}
if not kwargs['info_only']:
@ -107,6 +113,44 @@ class MGTV(VideoExtractor):
# Extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
def download(self, **kwargs):
if 'stream_id' in kwargs and kwargs['stream_id']:
stream_id = kwargs['stream_id']
else:
stream_id = 'null'
# print video info only
if 'info_only' in kwargs and kwargs['info_only']:
if stream_id != 'null':
if 'index' not in kwargs:
self.p(stream_id)
else:
self.p_i(stream_id)
else:
# Display all available streams
if 'index' not in kwargs:
self.p([])
else:
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
self.p_i(stream_id)
# default to use the best quality
if stream_id == 'null':
stream_id = self.streams_sorted[0]['id']
stream_info = self.streams[stream_id]
if not kwargs['info_only']:
if player:
# with m3u8 format because some video player can process urls automatically (e.g. mpv)
launch_player(player, [stream_info['m3u8_url']])
else:
download_urls(stream_info['src'], self.title, stream_info['container'], stream_info['size'],
output_dir=kwargs['output_dir'],
merge=kwargs.get('merge', True))
# av=stream_id in self.dash_streams)
site = MGTV()
download = site.download_by_url
download_playlist = site.download_playlist_by_url

View File

@ -2,12 +2,13 @@
__all__ = ['miaopai_download']
import string
import random
from ..common import *
import urllib.error
import urllib.parse
from ..util import fs
def miaopai_download_by_url(url, output_dir = '.', merge = False, info_only = False, **kwargs):
'''Source: Android mobile'''
if re.match(r'http://video.weibo.com/show\?fid=(\d{4}:\w{32})\w*', url):
fake_headers_mobile = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'UTF-8,*;q=0.5',
@ -15,29 +16,113 @@ def miaopai_download_by_url(url, output_dir = '.', merge = False, info_only = Fa
'Accept-Language': 'en-US,en;q=0.8',
'User-Agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'
}
webpage_url = re.search(r'(http://video.weibo.com/show\?fid=\d{4}:\w{32})\w*', url).group(1) + '&type=mp4' #mobile
#grab download URL
a = get_content(webpage_url, headers= fake_headers_mobile , decoded=True)
url = match1(a, r'<video src="(.*?)\"\W')
def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = False, **kwargs):
'''Source: Android mobile'''
page_url = 'http://video.weibo.com/show?fid=' + fid + '&type=mp4'
#grab title
b = get_content(webpage_url) #normal
title = match1(b, r'<meta name="description" content="([\s\S]*?)\"\W')
type_, ext, size = url_info(url)
print_info(site_info, title, type_, size)
mobile_page = get_content(page_url, headers=fake_headers_mobile)
url = match1(mobile_page, r'<video id=.*?src=[\'"](.*?)[\'"]\W')
if url is None:
wb_mp = re.search(r'<script src=([\'"])(.+?wb_mp\.js)\1>', mobile_page).group(2)
return miaopai_download_by_wbmp(wb_mp, fid, output_dir=output_dir, merge=merge,
info_only=info_only, total_size=None, **kwargs)
title = match1(mobile_page, r'<title>((.|\n)+?)</title>')
if not title:
title = fid
title = title.replace('\n', '_')
ext, size = 'mp4', url_info(url)[2]
print_info(site_info, title, ext, size)
if not info_only:
download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
#----------------------------------------------------------------------
def miaopai_download_by_wbmp(wbmp_url, fid, info_only=False, **kwargs):
headers = {}
headers.update(fake_headers_mobile)
headers['Host'] = 'imgaliyuncdn.miaopai.com'
wbmp = get_content(wbmp_url, headers=headers)
appid = re.search(r'appid:\s*?([^,]+?),', wbmp).group(1)
jsonp = re.search(r'jsonp:\s*?([\'"])(\w+?)\1', wbmp).group(2)
population = [i for i in string.ascii_lowercase] + [i for i in string.digits]
info_url = '{}?{}'.format('http://p.weibo.com/aj_media/info', parse.urlencode({
'appid': appid.strip(),
'fid': fid,
jsonp.strip(): '_jsonp' + ''.join(random.sample(population, 11))
}))
headers['Host'] = 'p.weibo.com'
jsonp_text = get_content(info_url, headers=headers)
jsonp_dict = json.loads(match1(jsonp_text, r'\(({.+})\)'))
if jsonp_dict['code'] != 200:
log.wtf('[Failed] "%s"' % jsonp_dict['msg'])
video_url = jsonp_dict['data']['meta_data'][0]['play_urls']['l']
title = jsonp_dict['data']['description']
title = title.replace('\n', '_')
ext = 'mp4'
headers['Host'] = 'f.us.sinaimg.cn'
print_info(site_info, title, ext, url_info(video_url, headers=headers)[2])
if not info_only:
download_urls([video_url], fs.legitimize(title), ext, headers=headers, **kwargs)
def miaopai_download_story(url, output_dir='.', merge=False, info_only=False, **kwargs):
data_url = 'https://m.weibo.cn/s/video/object?%s' % url.split('?')[1]
data_content = get_content(data_url, headers=fake_headers_mobile)
data = json.loads(data_content)
title = data['data']['object']['summary']
stream_url = data['data']['object']['stream']['url']
ext = 'mp4'
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
if not info_only:
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
def miaopai_download_direct(url, output_dir='.', merge=False, info_only=False, **kwargs):
mobile_page = get_content(url, headers=fake_headers_mobile)
try:
title = re.search(r'([\'"])title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
except:
title = re.search(r'([\'"])status_title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
title = title.replace('\n', '_')
try:
stream_url = re.search(r'([\'"])stream_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
except:
page_url = re.search(r'([\'"])page_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
return miaopai_download_story(page_url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
ext = 'mp4'
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
if not info_only:
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
def miaopai_download(url, output_dir='.', merge=False, info_only=False, **kwargs):
""""""
if re.match(r'http://video.weibo.com/show\?fid=(\d{4}:\w{32})\w*', url):
miaopai_download_by_url(url, output_dir, merge, info_only)
elif re.match(r'http://weibo.com/p/230444\w+', url):
_fid = match1(url, r'http://weibo.com/p/230444(\w+)')
miaopai_download_by_url('http://video.weibo.com/show?fid=1034:{_fid}'.format(_fid = _fid), output_dir, merge, info_only)
if re.match(r'^http[s]://.*\.weibo\.com/\d+/.+', url):
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
if re.match(r'^http[s]://.*\.weibo\.(com|cn)/s/video/.+', url):
return miaopai_download_story(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
# FIXME!
if re.match(r'^http[s]://.*\.weibo\.com/tv/v/(\w+)', url):
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
fid = match1(url, r'\?fid=(\d{4}:\w+)')
if fid is not None:
miaopai_download_by_fid(fid, output_dir, merge, info_only)
elif '/p/230444' in url:
fid = match1(url, r'/p/230444(\w+)')
miaopai_download_by_fid('1034:'+fid, output_dir, merge, info_only)
else:
mobile_page = get_content(url, headers = fake_headers_mobile)
hit = re.search(r'"page_url"\s*:\s*"([^"]+)"', mobile_page)
if not hit:
raise Exception('Unknown pattern')
else:
escaped_url = hit.group(1)
miaopai_download(urllib.parse.unquote(escaped_url), output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "miaopai"
download = miaopai_download

View File

@ -0,0 +1,361 @@
"""
MIT License
Copyright (c) 2019 WaferJay
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"""
import json
import os
import re
from ..common import get_content, urls_size, log, player, dry_run
from ..extractor import VideoExtractor
_UA = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 ' \
'(KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
class _NoMatchException(Exception):
pass
class _Dispatcher(object):
def __init__(self):
self.entry = []
def register(self, patterns, fun):
if not isinstance(patterns, (list, tuple)):
patterns = [patterns]
patterns = [re.compile(reg) for reg in patterns]
self.entry.append((patterns, fun))
def endpoint(self, *patterns):
assert patterns, 'patterns must not be empty'
def _wrap(fun):
self.register(patterns, fun)
return fun
return _wrap
def test(self, url):
return any(pa.search(url) for pas, _ in self.entry for pa in pas)
def dispatch(self, url, *args, **kwargs):
for patterns, fun in self.entry:
for pa in patterns:
match = pa.search(url)
if not match:
continue
kwargs.update(match.groupdict())
return fun(*args, **kwargs)
raise _NoMatchException()
missevan_stream_types = [
{'id': 'source', 'quality': '源文件', 'url_json_key': 'soundurl',
'resource_url_fmt': 'sound/{resource_url}'},
{'id': '320', 'quality': '320 Kbps', 'url_json_key': 'soundurl_64'},
{'id': '128', 'quality': '128 Kbps', 'url_json_key': 'soundurl_128'},
{'id': '32', 'quality': '32 Kbps', 'url_json_key': 'soundurl_32'},
{'id': 'covers', 'desc': '封面图', 'url_json_key': 'cover_image',
'default_src': 'covers/nocover.png',
'resource_url_fmt': 'covers/{resource_url}'},
{'id': 'coversmini', 'desc': '封面缩略图', 'url_json_key': 'cover_image',
'default_src': 'coversmini/nocover.png',
'resource_url_fmt': 'coversmini/{resource_url}'}
]
def _get_resource_uri(data, stream_type):
uri = data[stream_type['url_json_key']]
if not uri:
return stream_type.get('default_src')
uri_fmt = stream_type.get('resource_url_fmt')
if not uri_fmt:
return uri
return uri_fmt.format(resource_url=uri)
def is_covers_stream(stream):
stream = stream or ''
return stream.lower() in ('covers', 'coversmini')
def get_file_extension(file_path, default=''):
_, suffix = os.path.splitext(file_path)
if suffix:
# remove dot
suffix = suffix[1:]
return suffix or default
def best_quality_stream_id(streams, stream_types):
for stream_type in stream_types:
if streams.get(stream_type['id']):
return stream_type['id']
raise AssertionError('no stream selected')
class MissEvanWithStream(VideoExtractor):
name = 'MissEvan'
stream_types = missevan_stream_types
def __init__(self, *args):
super().__init__(*args)
self.referer = 'https://www.missevan.com/'
self.ua = _UA
@classmethod
def create(cls, title, streams, *, streams_sorted=None):
obj = cls()
obj.title = title
obj.streams.update(streams)
streams_sorted = streams_sorted or cls._setup_streams_sorted(streams)
obj.streams_sorted.extend(streams_sorted)
return obj
def set_danmaku(self, danmaku):
self.danmaku = danmaku
return self
@staticmethod
def _setup_streams_sorted(streams):
streams_sorted = []
for key, stream in streams.items():
copy_stream = stream.copy()
copy_stream['id'] = key
streams_sorted.append(copy_stream)
return streams_sorted
def download(self, **kwargs):
stream_id = kwargs.get('stream_id') or self.stream_types[0]['id']
stream = self.streams[stream_id]
if 'size' not in stream:
stream['size'] = urls_size(stream['src'])
super().download(**kwargs)
def unsupported_method(self, *args, **kwargs):
raise AssertionError('Unsupported')
download_by_url = unsupported_method
download_by_vid = unsupported_method
prepare = unsupported_method
extract = unsupported_method
class MissEvan(VideoExtractor):
name = 'MissEvan'
stream_types = missevan_stream_types
def __init__(self, *args):
super().__init__(*args)
self.referer = 'https://www.missevan.com/'
self.ua = _UA
self.__headers = {'User-Agent': self.ua, 'Referer': self.referer}
__prepare_dispatcher = _Dispatcher()
@__prepare_dispatcher.endpoint(
re.compile(r'missevan\.com/sound/(?:player\?.*?id=)?(?P<sid>\d+)', re.I))
def prepare_sound(self, sid, **kwargs):
json_data = self._get_json(self.url_sound_api(sid))
sound = json_data['info']['sound']
self.title = sound['soundstr']
if sound.get('need_pay'):
log.e('付费资源无法下载')
return
if not is_covers_stream(kwargs.get('stream_id')) and not dry_run:
self.danmaku = self._get_content(self.url_danmaku_api(sid))
self.streams = self.setup_streams(sound)
@classmethod
def setup_streams(cls, sound):
streams = {}
for stream_type in cls.stream_types:
uri = _get_resource_uri(sound, stream_type)
resource_url = cls.url_resource(uri) if uri else None
if resource_url:
container = get_file_extension(resource_url)
stream_id = stream_type['id']
streams[stream_id] = {'src': [resource_url], 'container': container}
quality = stream_type.get('quality')
if quality:
streams[stream_id]['quality'] = quality
return streams
def prepare(self, **kwargs):
if self.vid:
self.prepare_sound(self.vid, **kwargs)
return
try:
self.__prepare_dispatcher.dispatch(self.url, self, **kwargs)
except _NoMatchException:
log.e('[Error] Unsupported URL pattern.')
exit(1)
@staticmethod
def download_covers(title, streams, **kwargs):
if not is_covers_stream(kwargs.get('stream_id')) \
and not kwargs.get('json_output') \
and not kwargs.get('info_only') \
and not player:
kwargs['stream_id'] = 'covers'
MissEvanWithStream \
.create(title, streams) \
.download(**kwargs)
_download_playlist_dispatcher = _Dispatcher()
@_download_playlist_dispatcher.endpoint(
re.compile(r'missevan\.com/album(?:info)?/(?P<aid>\d+)', re.I))
def download_album(self, aid, **kwargs):
json_data = self._get_json(self.url_album_api(aid))
album = json_data['info']['album']
self.title = album['title']
sounds = json_data['info']['sounds']
output_dir = os.path.abspath(kwargs.pop('output_dir', '.'))
output_dir = os.path.join(output_dir, self.title)
kwargs['output_dir'] = output_dir
for sound in sounds:
sound_title = sound['soundstr']
if sound.get('need_pay'):
log.w('跳过付费资源: ' + sound_title)
continue
streams = self.setup_streams(sound)
extractor = MissEvanWithStream.create(sound_title, streams)
if not dry_run:
sound_id = sound['id']
danmaku = self._get_content(self.url_danmaku_api(sound_id))
extractor.set_danmaku(danmaku)
extractor.download(**kwargs)
self.download_covers(sound_title, streams, **kwargs)
@_download_playlist_dispatcher.endpoint(
re.compile(r'missevan\.com(?:/mdrama)?/drama/(?P<did>\d+)', re.I))
def download_drama(self, did, **kwargs):
json_data = self._get_json(self.url_drama_api(did))
drama = json_data['info']['drama']
if drama.get('need_pay'):
log.w('该剧集包含付费资源, 付费资源将被跳过')
self.title = drama['name']
output_dir = os.path.abspath(kwargs.pop('output_dir', '.'))
output_dir = os.path.join(output_dir, self.title)
kwargs['output_dir'] = output_dir
episodes = json_data['info']['episodes']
for each in episodes['episode']:
if each.get('need_pay'):
log.w('跳过付费资源: ' + each['soundstr'])
continue
sound_id = each['sound_id']
MissEvan().download_by_vid(sound_id, **kwargs)
def download_playlist_by_url(self, url, **kwargs):
self.url = url
try:
self._download_playlist_dispatcher.dispatch(url, self, **kwargs)
except _NoMatchException:
log.e('[Error] Unsupported URL pattern with --playlist option.')
exit(1)
def download_by_url(self, url, **kwargs):
if not kwargs.get('playlist') and self._download_playlist_dispatcher.test(url):
log.w('This is an album or drama. (use --playlist option to download all).')
else:
super().download_by_url(url, **kwargs)
def download(self, **kwargs):
kwargs['keep_obj'] = True # keep the self.streams to download cover
super().download(**kwargs)
self.download_covers(self.title, self.streams, **kwargs)
def extract(self, **kwargs):
stream_id = kwargs.get('stream_id')
# fetch all streams size when output info or json
if kwargs.get('info_only') and not stream_id \
or kwargs.get('json_output'):
for _, stream in self.streams.items():
stream['size'] = urls_size(stream['src'])
return
# fetch size of the selected stream only
if not stream_id:
stream_id = best_quality_stream_id(self.streams, self.stream_types)
stream = self.streams[stream_id]
if 'size' not in stream:
stream['size'] = urls_size(stream['src'])
def _get_content(self, url):
return get_content(url, headers=self.__headers)
def _get_json(self, url):
content = self._get_content(url)
return json.loads(content)
@staticmethod
def url_album_api(album_id):
return 'https://www.missevan.com/sound' \
'/soundalllist?albumid=' + str(album_id)
@staticmethod
def url_sound_api(sound_id):
return 'https://www.missevan.com/sound' \
'/getsound?soundid=' + str(sound_id)
@staticmethod
def url_drama_api(drama_id):
return 'https://www.missevan.com/dramaapi' \
'/getdrama?drama_id=' + str(drama_id)
@staticmethod
def url_danmaku_api(sound_id):
return 'https://www.missevan.com/sound/getdm?soundid=' + str(sound_id)
@staticmethod
def url_resource(uri):
return 'https://static.missevan.com/' + uri
site = MissEvan()
site_info = 'MissEvan.com'
download = site.download_by_url
download_playlist = site.download_playlist_by_url

View File

@ -1,38 +0,0 @@
#!/usr/bin/env python
from ..common import *
from ..extractor import VideoExtractor
import json
class MusicPlayOn(VideoExtractor):
name = "MusicPlayOn"
stream_types = [
{'id': '720p HD'},
{'id': '360p SD'},
]
def prepare(self, **kwargs):
content = get_content(self.url)
self.title = match1(content,
r'setup\[\'title\'\] = "([^"]+)";')
for s in self.stream_types:
quality = s['id']
src = match1(content,
r'src: "([^"]+)", "data-res": "%s"' % quality)
if src is not None:
url = 'http://en.musicplayon.com%s' % src
self.streams[quality] = {'url': url}
def extract(self, **kwargs):
for i in self.streams:
s = self.streams[i]
_, s['container'], s['size'] = url_info(s['url'])
s['src'] = [s['url']]
site = MusicPlayOn()
download = site.download_by_url
# TBD: implement download_playlist

View File

@ -17,6 +17,10 @@ def nanagogo_download(url, output_dir='.', merge=True, info_only=False, **kwargs
info = json.loads(get_content(api_url))
items = []
if info['data']['posts']['post'] is None:
return
if info['data']['posts']['post']['body'] is None:
return
for i in info['data']['posts']['post']['body']:
if 'image' in i:
image_url = i['image']

View File

@ -1,48 +1,40 @@
#!/usr/bin/env python
__all__ = ['naver_download']
import urllib.request, urllib.parse
from ..common import *
import urllib.request
import urllib.parse
import json
import re
def naver_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
from ..util import log
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
from .universal import *
assert re.search(r'http://tvcast.naver.com/v/', url), "URL is not supported"
__all__ = ['naver_download_by_url']
html = get_html(url)
contentid = re.search(r'var rmcPlayer = new nhn.rmcnmv.RMCVideoPlayer\("(.+?)", "(.+?)"',html)
videoid = contentid.group(1)
inkey = contentid.group(2)
assert videoid
assert inkey
info_key = urllib.parse.urlencode({'vid': videoid, 'inKey': inkey, })
down_key = urllib.parse.urlencode({'masterVid': videoid,'protocol': 'p2p','inKey': inkey, })
inf_xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/videoInfo.nhn?%s' % info_key )
from xml.dom.minidom import parseString
doc_info = parseString(inf_xml)
Subject = doc_info.getElementsByTagName('Subject')[0].firstChild
title = Subject.data
assert title
xml = get_html('http://serviceapi.rmcnmv.naver.com/flash/playableEncodingOption.nhn?%s' % down_key )
doc = parseString(xml)
encodingoptions = doc.getElementsByTagName('EncodingOption')
old_height = doc.getElementsByTagName('height')[0]
real_url= ''
#to download the highest resolution one,
for node in encodingoptions:
new_height = node.getElementsByTagName('height')[0]
domain_node = node.getElementsByTagName('Domain')[0]
uri_node = node.getElementsByTagName('uri')[0]
if int(new_height.firstChild.data) > int (old_height.firstChild.data):
real_url= domain_node.firstChild.data+ '/' +uri_node.firstChild.data
type, ext, size = url_info(real_url)
print_info(site_info, title, type, size)
def naver_download_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
ep = 'https://apis.naver.com/rmcnmv/rmcnmv/vod/play/v2.0/{}?key={}'
page = get_content(url)
try:
vid = re.search(r"\"videoId\"\s*:\s*\"(.+?)\"", page).group(1)
key = re.search(r"\"inKey\"\s*:\s*\"(.+?)\"", page).group(1)
meta_str = get_content(ep.format(vid, key))
meta_json = json.loads(meta_str)
if 'errorCode' in meta_json:
log.wtf(meta_json['errorCode'])
title = meta_json['meta']['subject']
videos = meta_json['videos']['list']
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
video_url = video_list[-1]['source']
# size = video_list[-1]['size']
# result wrong size
size = url_size(video_url)
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge = merge)
download_urls([video_url], title, 'mp4', size, output_dir, **kwargs)
except:
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
site_info = "tvcast.naver.com"
download = naver_download
site_info = "naver.com"
download = naver_download_by_url
download_playlist = playlist_not_supported('naver')

View File

@ -22,14 +22,14 @@ def netease_hymn():
"""
def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
rid = match1(url, r'id=(.*)')
rid = match1(url, r'\Wid=(.*)')
if rid is None:
rid = match1(url, r'/(\d+)/?$')
rid = match1(url, r'/(\d+)/?')
if "album" in url:
j = loads(get_content("http://music.163.com/api/album/%s?id=%s&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
artist_name = j['album']['artists'][0]['name']
album_name = j['album']['name']
album_name = j['album']['name'].strip()
new_dir = output_dir + '/' + fs.legitimize("%s - %s" % (artist_name, album_name))
if not info_only:
if not os.path.exists(new_dir):
@ -55,12 +55,14 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
cover_url = j['result']['coverImgUrl']
download_urls([cover_url], "cover", "jpg", 0, new_dir)
for i in j['result']['tracks']:
netease_song_download(i, output_dir=new_dir, info_only=info_only)
prefix_width = len(str(len(j['result']['tracks'])))
for n, i in enumerate(j['result']['tracks']):
playlist_prefix = '%%.%dd_' % prefix_width % n
netease_song_download(i, output_dir=new_dir, info_only=info_only, playlist_prefix=playlist_prefix)
try: # download lyrics
assert kwargs['caption']
l = loads(get_content("http://music.163.com/api/song/lyric/?id=%s&lv=-1&csrf_token=" % i['id'], headers={"Referer": "http://music.163.com/"}))
netease_lyric_download(i, l["lrc"]["lyric"], output_dir=new_dir, info_only=info_only)
netease_lyric_download(i, l["lrc"]["lyric"], output_dir=new_dir, info_only=info_only, playlist_prefix=playlist_prefix)
except: pass
elif "song" in url:
@ -85,10 +87,10 @@ def netease_cloud_music_download(url, output_dir='.', merge=True, info_only=Fals
j = loads(get_content("http://music.163.com/api/mv/detail/?id=%s&ids=[%s]&csrf_token=" % (rid, rid), headers={"Referer": "http://music.163.com/"}))
netease_video_download(j['data'], output_dir=output_dir, info_only=info_only)
def netease_lyric_download(song, lyric, output_dir='.', info_only=False):
def netease_lyric_download(song, lyric, output_dir='.', info_only=False, playlist_prefix=""):
if info_only: return
title = "%s. %s" % (song['position'], song['name'])
title = "%s%s. %s" % (playlist_prefix, song['position'], song['name'])
filename = '%s.lrc' % get_filename(title)
print('Saving %s ...' % filename, end="", flush=True)
with open(os.path.join(output_dir, filename),
@ -103,8 +105,11 @@ def netease_video_download(vinfo, output_dir='.', info_only=False):
netease_download_common(title, url_best,
output_dir=output_dir, info_only=info_only)
def netease_song_download(song, output_dir='.', info_only=False):
title = "%s. %s" % (song['position'], song['name'])
def netease_song_download(song, output_dir='.', info_only=False, playlist_prefix=""):
title = "%s%s. %s" % (playlist_prefix, song['position'], song['name'])
url_best = "http://music.163.com/song/media/outer/url?id=" + \
str(song['id']) + ".mp3"
'''
songNet = 'p' + song['mp3Url'].split('/')[2][1:]
if 'hMusic' in song and song['hMusic'] != None:
@ -113,15 +118,15 @@ def netease_song_download(song, output_dir='.', info_only=False):
url_best = song['mp3Url']
elif 'bMusic' in song:
url_best = make_url(songNet, song['bMusic']['dfsId'])
'''
netease_download_common(title, url_best,
output_dir=output_dir, info_only=info_only)
def netease_download_common(title, url_best, output_dir, info_only):
songtype, ext, size = url_info(url_best)
songtype, ext, size = url_info(url_best, faker=True)
print_info(site_info, title, songtype, size)
if not info_only:
download_urls([url_best], title, ext, size, output_dir)
download_urls([url_best], title, ext, size, output_dir, faker=True)
def netease_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):

View File

@ -31,10 +31,11 @@ context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
nicovideo_login(user, password)
html = get_html(url) # necessary!
title = unicodize(r1(r'<span class="videoHeaderTitle"[^>]*>([^<]+)</span>', html))
title = r1(r'<title>(.+?)</title>', html)
#title = unicodize(r1(r'<span class="videoHeaderTitle"[^>]*>([^<]+)</span>', html))
vid = url.split('/')[-1].split('?')[0]
api_html = get_html('http://www.nicovideo.jp/api/getflv?v=%s' % vid)
api_html = get_html('http://flapi.nicovideo.jp/api/getflv?v=%s' % vid)
real_url = parse.unquote(r1(r'url=([^&]+)&', api_html))
type, ext, size = url_info(real_url)

View File

@ -1,33 +0,0 @@
#!/usr/bin/env python
__all__ = ['panda_download']
from ..common import *
import json
import time
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
roomid = url[url.rfind('/')+1:]
json_request_url = 'http://www.panda.tv/api_room?roomid={}&pub_key=&_={}'.format(roomid, int(time.time()))
content = get_html(json_request_url)
errno = json.loads(content)['errno']
errmsg = json.loads(content)['errmsg']
if errno:
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
data = json.loads(content)['data']
title = data.get('roominfo')['name']
room_key = data.get('videoinfo')['room_key']
plflag = data.get('videoinfo')['plflag'].split('_')
status = data.get('videoinfo')['status']
if status is not "2":
raise ValueError("The live stream is not online! (status:%s)" % status)
real_url = 'http://pl{}.live.panda.tv/live_panda/{}.flv'.format(plflag[1],room_key)
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
site_info = "panda.tv"
download = panda_download
download_playlist = playlist_not_supported('panda')

View File

@ -1,154 +1,229 @@
#!/usr/bin/env python
__all__ = ['pptv_download', 'pptv_download_by_id']
#__all__ = ['pptv_download', 'pptv_download_by_id']
from ..common import *
from ..extractor import VideoExtractor
import re
import time
import urllib
from random import random
import random
import binascii
from xml.dom.minidom import parseString
def constructKey(arg):
def lshift(a, b):
return (a << b) & 0xffffffff
def rshift(a, b):
if a >= 0:
return a >> b
return (0x100000000 + a) >> b
def str2hex(s):
r=""
for i in s[:8]:
t=hex(ord(i))[2:]
if len(t)==1:
t="0"+t
r+=t
for i in range(16):
r+=hex(int(15*random()))[2:]
return r
def le32_pack(b_str):
result = 0
result |= b_str[0]
result |= (b_str[1] << 8)
result |= (b_str[2] << 16)
result |= (b_str[3] << 24)
return result
#ABANDONED Because SERVER_KEY is static
def getkey(s):
#returns 1896220160
l2=[i for i in s]
l4=0
l3=0
while l4<len(l2):
l5=l2[l4]
l6=ord(l5)
l7=l6<<((l4%4)*8)
l3=l3^l7
l4+=1
return l3
pass
def rot(k,b): ##>>> in as3
if k>=0:
return k>>b
elif k<0:
return (2**32+k)>>b
pass
def lot(k,b):
return (k<<b)%(2**32)
#WTF?
def encrypt(arg1,arg2):
def tea_core(data, key_seg):
delta = 2654435769
l3=16;
l4=getkey(arg2) #1896220160
l8=[i for i in arg1]
l10=l4;
l9=[i for i in arg2]
l5=lot(l10,8)|rot(l10,24)#101056625
# assert l5==101056625
l6=lot(l10,16)|rot(l10,16)#100692230
# assert 100692230==l6
l7=lot(l10,24)|rot(l10,8)
# assert 7407110==l7
l11=""
l12=0
l13=ord(l8[l12])<<0
l14=ord(l8[l12+1])<<8
l15=ord(l8[l12+2])<<16
l16=ord(l8[l12+3])<<24
l17=ord(l8[l12+4])<<0
l18=ord(l8[l12+5])<<8
l19=ord(l8[l12+6])<<16
l20=ord(l8[l12+7])<<24
l21=(((0|l13)|l14)|l15)|l16
l22=(((0|l17)|l18)|l19)|l20
d0 = le32_pack(data[:4])
d1 = le32_pack(data[4:8])
l23=0
l24=0
while l24<32:
l23=(l23+delta)%(2**32)
l33=(lot(l22,4)+l4)%(2**32)
l34=(l22+l23)%(2**32)
l35=(rot(l22,5)+l5)%(2**32)
l36=(l33^l34)^l35
l21=(l21+l36)%(2**32)
l37=(lot(l21,4)+l6)%(2**32)
l38=(l21+l23)%(2**32)
l39=(rot(l21,5))%(2**32)
l40=(l39+l7)%(2**32)
l41=((l37^l38)%(2**32)^l40)%(2**32)
l22=(l22+l41)%(2**32)
sum_ = 0
for rnd in range(32):
sum_ = (sum_ + delta) & 0xffffffff
p1 = (lshift(d1, 4) + key_seg[0]) & 0xffffffff
p2 = (d1 + sum_) & 0xffffffff
p3 = (rshift(d1, 5) + key_seg[1]) & 0xffffffff
l24+=1
mid_p = p1 ^ p2 ^ p3
d0 = (d0 + mid_p) & 0xffffffff
l11+=chr(rot(l21,0)&0xff)
l11+=chr(rot(l21,8)&0xff)
l11+=chr(rot(l21,16)&0xff)
l11+=chr(rot(l21,24)&0xff)
l11+=chr(rot(l22,0)&0xff)
l11+=chr(rot(l22,8)&0xff)
l11+=chr(rot(l22,16)&0xff)
l11+=chr(rot(l22,24)&0xff)
p4 = (lshift(d0, 4) + key_seg[2]) & 0xffffffff
p5 = (d0 + sum_) & 0xffffffff
p6 = (rshift(d0, 5) + key_seg[3]) & 0xffffffff
return l11
mid_p = p4 ^ p5 ^ p6
d1 = (d1 + mid_p) & 0xffffffff
return bytes(unpack_le32(d0) + unpack_le32(d1))
def ran_hex(size):
result = []
for i in range(size):
result.append(hex(int(15 * random.random()))[2:])
return ''.join(result)
def zpad(b_str, size):
size_diff = size - len(b_str)
return b_str + bytes(size_diff)
def gen_key(t):
key_seg = [1896220160,101056625, 100692230, 7407110]
t_s = hex(int(t))[2:].encode('utf8')
input_data = zpad(t_s, 16)
out = tea_core(input_data, key_seg)
return binascii.hexlify(out[:8]).decode('utf8') + ran_hex(16)
def unpack_le32(i32):
result = []
result.append(i32 & 0xff)
i32 = rshift(i32, 8)
result.append(i32 & 0xff)
i32 = rshift(i32, 8)
result.append(i32 & 0xff)
i32 = rshift(i32, 8)
result.append(i32 & 0xff)
return result
def get_elem(elem, tag):
return elem.getElementsByTagName(tag)
def get_attr(elem, attr):
return elem.getAttribute(attr)
def get_text(elem):
return elem.firstChild.nodeValue
def shift_time(time_str):
ts = time_str[:-4]
return time.mktime(time.strptime(ts)) - 60
def parse_pptv_xml(dom):
channel = get_elem(dom, 'channel')[0]
title = get_attr(channel, 'nm')
file_list = get_elem(channel, 'file')[0]
item_list = get_elem(file_list, 'item')
streams_cnt = len(item_list)
item_mlist = []
for item in item_list:
rid = get_attr(item, 'rid')
file_type = get_attr(item, 'ft')
size = get_attr(item, 'filesize')
width = get_attr(item, 'width')
height = get_attr(item, 'height')
bitrate = get_attr(item, 'bitrate')
res = '{}x{}@{}kbps'.format(width, height, bitrate)
item_meta = (file_type, rid, size, res)
item_mlist.append(item_meta)
dt_list = get_elem(dom, 'dt')
dragdata_list = get_elem(dom, 'dragdata')
stream_mlist = []
for dt in dt_list:
file_type = get_attr(dt, 'ft')
serv_time = get_text(get_elem(dt, 'st')[0])
expr_time = get_text(get_elem(dt, 'key')[0])
serv_addr = get_text(get_elem(dt, 'sh')[0])
stream_meta = (file_type, serv_addr, expr_time, serv_time)
stream_mlist.append(stream_meta)
segs_mlist = []
for dd in dragdata_list:
file_type = get_attr(dd, 'ft')
seg_list = get_elem(dd, 'sgm')
segs = []
segs_size = []
for seg in seg_list:
rid = get_attr(seg, 'rid')
size = get_attr(seg, 'fs')
segs.append(rid)
segs_size.append(size)
segs_meta = (file_type, segs, segs_size)
segs_mlist.append(segs_meta)
return title, item_mlist, stream_mlist, segs_mlist
#mergs 3 meta_data
def merge_meta(item_mlist, stream_mlist, segs_mlist):
streams = {}
for i in range(len(segs_mlist)):
streams[str(i)] = {}
for item in item_mlist:
stream = streams[item[0]]
stream['rid'] = item[1]
stream['size'] = item[2]
stream['res'] = item[3]
for s in stream_mlist:
stream = streams[s[0]]
stream['serv_addr'] = s[1]
stream['expr_time'] = s[2]
stream['serv_time'] = s[3]
for seg in segs_mlist:
stream = streams[seg[0]]
stream['segs'] = seg[1]
stream['segs_size'] = seg[2]
return streams
loc1=hex(int(arg))[2:]+(16-len(hex(int(arg))[2:]))*"\x00"
SERVER_KEY="qqqqqww"+"\x00"*9
res=encrypt(loc1,SERVER_KEY)
return str2hex(res)
def make_url(stream):
host = stream['serv_addr']
rid = stream['rid']
key = gen_key(shift_time(stream['serv_time']))
key_expr = stream['expr_time']
src = []
for i, seg in enumerate(stream['segs']):
url = 'http://{}/{}/{}?key={}&k={}'.format(host, i, rid, key, key_expr)
url += '&type=web.fpp'
src.append(url)
return src
def pptv_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
xml = get_html('http://web-play.pptv.com/webplay3-0-%s.xml?type=web.fpp' % id)
#vt=3 means vod mode vt=5 means live mode
host = r1(r'<sh>([^<>]+)</sh>', xml)
k = r1(r'<key expire=[^<>]+>([^<>]+)</key>', xml)
rid = r1(r'rid="([^"]+)"', xml)
title = r1(r'nm="([^"]+)"', xml)
class PPTV(VideoExtractor):
name = 'PPTV'
stream_types = [
{'itag': '4'},
{'itag': '3'},
{'itag': '2'},
{'itag': '1'},
{'itag': '0'},
]
st=r1(r'<st>([^<>]+)</st>',xml)[:-4]
st=time.mktime(time.strptime(st))*1000-60*1000-time.time()*1000
st+=time.time()*1000
st=st/1000
def prepare(self, **kwargs):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/69.0.3497.100 Safari/537.36"
}
self.vid = match1(self.url, r'https?://sports.pptv.com/vod/(\d+)/*')
if self.url and not self.vid:
if not re.match(r'https?://v.pptv.com/show/(\w+)\.html', self.url):
raise('Unknown url pattern')
page_content = get_content(self.url, headers)
key=constructKey(st)
self.vid = match1(page_content, r'webcfg\s*=\s*{"id":\s*(\d+)')
if not self.vid:
request = urllib.request.Request(self.url, headers=headers)
response = urllib.request.urlopen(request)
self.vid = match1(response.url, r'https?://sports.pptv.com/vod/(\d+)/*')
pieces = re.findall('<sgm no="(\d+)"[^<>]+fs="(\d+)"', xml)
numbers, fs = zip(*pieces)
urls=["http://{}/{}/{}?key={}&fpp.ver=1.3.0.4&k={}&type=web.fpp".format(host,i,rid,key,k) for i in range(max(map(int,numbers))+1)]
if not self.vid:
raise('Cannot find id')
api_url = 'http://web-play.pptv.com/webplay3-0-{}.xml'.format(self.vid)
api_url += '?type=web.fpp&param=type=web.fpp&version=4'
dom = parseString(get_content(api_url, headers))
self.title, m_items, m_streams, m_segs = parse_pptv_xml(dom)
xml_streams = merge_meta(m_items, m_streams, m_segs)
for stream_id in xml_streams:
stream_data = xml_streams[stream_id]
src = make_url(stream_data)
self.streams[stream_id] = {
'container': 'mp4',
'video_profile': stream_data['res'],
'size': int(stream_data['size']),
'src': src
}
total_size = sum(map(int, fs))
assert rid.endswith('.mp4')
print_info(site_info, title, 'mp4', total_size)
if not info_only:
try:
download_urls(urls, title, 'mp4', total_size, output_dir = output_dir, merge = merge)
except urllib.error.HTTPError:
#for key expired
pptv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
def pptv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
assert re.match(r'http://v.pptv.com/show/(\w+)\.html$', url)
html = get_html(url)
id = r1(r'webcfg\s*=\s*{"id":\s*(\d+)', html)
assert id
pptv_download_by_id(id, output_dir = output_dir, merge = merge, info_only = info_only)
site_info = "PPTV.com"
download = pptv_download
site = PPTV()
#site_info = "PPTV.com"
#download = pptv_download
download = site.download_by_url
download_playlist = playlist_not_supported('pptv')

View File

@ -1,40 +0,0 @@
#!/usr/bin/env python
__all__ = ['qianmo_download']
from ..common import *
import urllib.error
import json
def qianmo_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
if re.match(r'http://qianmo.com/\w+', url):
html = get_html(url)
match = re.search(r'(.+?)var video =(.+?);', html)
if match:
video_info_json = json.loads(match.group(2))
title = video_info_json['title']
ext_video_id = video_info_json['ext_video_id']
html = get_content('http://v.qianmo.com/player/{ext_video_id}'.format(ext_video_id = ext_video_id))
c = json.loads(html)
url_list = []
for i in c['seg']: #Cannot do list comprehensions
for a in c['seg'][i]:
for b in a['url']:
url_list.append(b[0])
type_ = ''
size = 0
for url in url_list:
_, type_, temp = url_info(url)
size += temp
type, ext, size = url_info(url)
print_info(site_info, title, type_, size)
if not info_only:
download_urls(url_list, title, type_, total_size=None, output_dir=output_dir, merge=merge)
site_info = "qianmo"
download = qianmo_download
download_playlist = playlist_not_supported('qianmo')

View File

@ -3,6 +3,7 @@
from ..common import *
from ..extractor import VideoExtractor
from ..util.log import *
from json import loads
@ -19,13 +20,32 @@ class QiE(VideoExtractor):
id_dic = {i['video_profile']:(i['id']) for i in stream_types}
api_endpoint = 'http://www.qie.tv/api/v1/room/{room_id}'
game_ep = 'http://live.qq.com/game/game_details/get_game_details_info/'
@staticmethod
def get_vid_from_url(url):
def get_room_id_from_url(self, match_id):
meta = json.loads(get_content(self.game_ep + str(match_id)))
if meta['error'] != 0:
log.wtf('Error happens when accessing game_details api')
rooms = meta['data']['anchor_data']
for room in rooms:
if room['is_use_room']:
return room['room_id']
log.wtf('No room available for match {}'.format(match_id))
def get_vid_from_url(self, url):
"""Extracts video ID from live.qq.com.
"""
hit = re.search(r'live.qq.com/(\d+)', url)
if hit is not None:
return hit.group(1)
hit = re.search(r'live.qq.com/directory/match/(\d+)', url)
if hit is not None:
return self.get_room_id_from_url(hit.group(1))
html = get_content(url)
return match1(html, r'room_id\":(\d+)')
room_id = match1(html, r'room_id\":(\d+)')
if room_id is None:
log.wtf('Unknown page {}'.format(url))
return room_id
def download_playlist_by_url(self, url, **kwargs):
pass
@ -38,7 +58,7 @@ class QiE(VideoExtractor):
content = loads(content)
self.title = content['data']['room_name']
rtmp_url = content['data']['rtmp_url']
#stream_avalable = [i['name'] for i in content['data']['stream']]
#stream_available = [i['name'] for i in content['data']['stream']]
stream_available = {}
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
if len(content['data']['rtmp_multi_bitrate']) > 0:

View File

@ -0,0 +1,77 @@
from ..common import *
from ..extractor import VideoExtractor
from ..util.log import *
import json
import math
class QieVideo(VideoExtractor):
name = 'QiE Video'
vid_patt = r'"stream_name":"(\d+)"'
title_patt = r'"title":"([^\"]+)"'
cdn = 'http://qietv-play.wcs.8686c.com/'
ep = 'http://api.qiecdn.com/api/v1/video/stream/{}'
stream_types = [
{'id':'1080p', 'video_profile':'1920x1080', 'container':'m3u8'},
{'id':'720p', 'video_profile':'1280x720', 'container':'m3u8'},
{'id':'480p', 'video_profile':'853x480', 'container':'m3u8'}
]
def get_vid_from_url(self):
hit = re.search(self.__class__.vid_patt, self.page)
if hit is None:
log.wtf('Cannot get stream_id')
return hit.group(1)
def get_title(self):
hit = re.search(self.__class__.title_patt, self.page)
if hit is None:
return self.vid
return hit.group(1).strip()
def prepare(self, **kwargs):
self.page = get_content(self.url)
if self.vid is None:
self.vid = self.get_vid_from_url()
self.title = self.get_title()
meta = json.loads(get_content(self.__class__.ep.format(self.vid)))
if meta['code'] != 200:
log.wtf(meta['message'])
for video in meta['result']['videos']:
height = video['height']
url = self.__class__.cdn + video['key']
stream_meta = dict(m3u8_url=url, size=0, container='m3u8')
video_profile = '{}x{}'.format(video['width'], video['height'])
stream_meta['video_profile'] = video_profile
for stream_type in self.__class__.stream_types:
if height // 10 == int(stream_type['id'][:-1]) // 10:
# width 481, 482... 489 are all 480p here
stream_id = stream_type['id']
self.streams[stream_id] = stream_meta
def extract(self, **kwargs):
for stream_id in self.streams:
self.streams[stream_id]['src'], dur = general_m3u8_extractor(self.streams[stream_id]['m3u8_url'])
self.streams[stream_id]['video_profile'] += ', Duration: {}s'.format(math.floor(dur))
def general_m3u8_extractor(url):
dur = 0
base_url = url[:url.rfind('/')]
m3u8_content = get_content(url).split('\n')
result = []
for line in m3u8_content:
trimmed = line.strip()
if len(trimmed) > 0:
if trimmed.startswith('#'):
if trimmed.startswith('#EXTINF'):
t_str = re.search(r'(\d+\.\d+)', trimmed).group(1)
dur += float(t_str)
else:
if trimmed.startswith('http'):
result.append(trimmed)
else:
result.append(base_url + '/' + trimmed)
return result, dur
site = QieVideo()
download_by_url = site.download_by_url

View File

@ -0,0 +1,50 @@
import json
import re
from ..common import get_content, playlist_not_supported, url_size
from ..extractors import VideoExtractor
from ..util import log
__all__ = ['qingting_download_by_url']
class Qingting(VideoExtractor):
# every resource is described by its channel id and program id
# so vid is tuple (chaanel_id, program_id)
name = 'Qingting'
stream_types = [
{'id': '_default'}
]
ep = 'http://i.qingting.fm/wapi/channels/{}/programs/{}'
file_host = 'http://od.qingting.fm/{}'
mobile_pt = r'channels\/(\d+)\/programs/(\d+)'
def prepare(self, **kwargs):
if self.vid is None:
hit = re.search(self.__class__.mobile_pt, self.url)
self.vid = (hit.group(1), hit.group(2))
ep_url = self.__class__.ep.format(self.vid[0], self.vid[1])
meta = json.loads(get_content(ep_url))
if meta['code'] != 0:
log.wtf(meta['message']['errormsg'])
file_path = self.__class__.file_host.format(meta['data']['file_path'])
self.title = meta['data']['name']
duration = str(meta['data']['duration']) + 's'
self.streams['_default'] = {'src': [file_path], 'video_profile': duration, 'container': 'm4a'}
def extract(self, **kwargs):
self.streams['_default']['size'] = url_size(self.streams['_default']['src'][0])
def qingting_download_by_url(url, **kwargs):
Qingting().download_by_url(url, **kwargs)
site_info = 'Qingting'
download = qingting_download_by_url
download_playlist = playlist_not_supported('Qingting')

View File

@ -2,89 +2,151 @@
__all__ = ['qq_download']
from ..common import *
from .qie import download as qieDownload
from urllib.parse import urlparse,parse_qs
from .qie_video import download_by_url as qie_video_download
from ..common import *
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) QQLive/10275340/50192209 Chrome/43.0.2357.134 Safari/537.36 QBCore/3.43.561.202 QQBrowser/9.0.2524.400'
}
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3%2E2%2E19%2E333&platform=11&defnpayver=1&vid=' + vid
info = get_html(info_api)
# http://v.sports.qq.com/#/cover/t0fqsm1y83r8v5j/a0026nvw5jr https://v.qq.com/x/cover/t0fqsm1y83r8v5j/a0026nvw5jr.html
video_json = None
platforms = [4100201, 11]
for platform in platforms:
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform={}&defnpayver=1&defn=shd&vid={}'.format(platform, vid)
info = get_content(info_api, headers)
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
parts_vid = video_json['vl']['vi'][0]['vid']
parts_ti = video_json['vl']['vi'][0]['ti']
parts_prefix = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
parts_formats = video_json['fl']['fi']
# find best quality
# only looking for fhd(1080p) and shd(720p) here.
# 480p usually come with a single file, will be downloaded as fallback.
best_quality = ''
for part_format in parts_formats:
if part_format['name'] == 'fhd':
best_quality = 'fhd'
if not video_json.get('msg')=='cannot play outside':
break
fn_pre = video_json['vl']['vi'][0]['lnk']
title = video_json['vl']['vi'][0]['ti']
host = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
seg_cnt = fc_cnt = video_json['vl']['vi'][0]['cl']['fc']
if part_format['name'] == 'shd':
best_quality = 'shd'
filename = video_json['vl']['vi'][0]['fn']
if seg_cnt == 0:
seg_cnt = 1
else:
fn_pre, magic_str, video_type = filename.split('.')
for part_format in parts_formats:
if (not best_quality == '') and (not part_format['name'] == best_quality):
continue
part_format_id = part_format['id']
part_format_sl = part_format['sl']
if part_format_sl == 0:
part_urls= []
total_size = 0
try:
# For fhd(1080p), every part is about 100M and 6 minutes
# try 100 parts here limited download longest single video of 10 hours.
for part in range(1,100):
filename = vid + '.p' + str(part_format_id % 1000) + '.' + str(part) + '.mp4'
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format=%s&vid=%s&filename=%s" % (part_format_id, parts_vid, filename)
#print(filename)
#print(key_api)
part_info = get_html(key_api)
for part in range(1, seg_cnt+1):
if fc_cnt == 0:
# fix json parsing error
# example:https://v.qq.com/x/page/w0674l9yrrh.html
part_format_id = video_json['vl']['vi'][0]['cl']['keyid'].split('.')[-1]
else:
part_format_id = video_json['vl']['vi'][0]['cl']['ci'][part - 1]['keyid'].split('.')[1]
filename = '.'.join([fn_pre, magic_str, str(part), video_type])
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format={}&vid={}&filename={}&appver=3.2.19.333".format(part_format_id, vid, filename)
part_info = get_content(key_api, headers)
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
#print(key_json)
if key_json.get('key') is None:
vkey = video_json['vl']['vi'][0]['fvkey']
url = '{}{}?vkey={}'.format(video_json['vl']['vi'][0]['ul']['ui'][0]['url'], fn_pre + '.mp4', vkey)
else:
vkey = key_json['key']
url = '%s/%s?vkey=%s' % (parts_prefix, filename, vkey)
url = '{}{}?vkey={}'.format(host, filename, vkey)
if not vkey:
if part == 1:
log.wtf(key_json['msg'])
else:
log.w(key_json['msg'])
break
if key_json.get('filename') is None:
log.w(key_json['msg'])
break
part_urls.append(url)
_, ext, size = url_info(url, faker=True)
_, ext, size = url_info(url)
total_size += size
except:
pass
print_info(site_info, parts_ti, ext, total_size)
if not info_only:
download_urls(part_urls, parts_ti, ext, total_size, output_dir=output_dir, merge=merge)
else:
fvkey = output_json['vl']['vi'][0]['fvkey']
mp4 = output_json['vl']['vi'][0]['cl'].get('ci', None)
if mp4:
mp4 = mp4[0]['keyid'].replace('.10', '.p') + '.mp4'
else:
mp4 = output_json['vl']['vi'][0]['fn']
url = '%s/%s?vkey=%s' % ( parts_prefix, mp4, fvkey )
_, ext, size = url_info(url, faker=True)
print_info(site_info, title, ext, size)
print_info(site_info, title, ext, total_size)
if not info_only:
download_urls([url], title, ext, size, output_dir=output_dir, merge=merge)
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
def kg_qq_download_by_shareid(shareid, output_dir='.', info_only=False, caption=False):
BASE_URL = 'http://cgi.kg.qq.com/fcgi-bin/kg_ugc_getdetail'
params_str = '?dataType=jsonp&jsonp=callback&jsonpCallback=jsopgetsonginfo&v=4&outCharset=utf-8&shareid=' + shareid
url = BASE_URL + params_str
content = get_content(url, headers)
json_str = content[len('jsonpcallback('):-1]
json_data = json.loads(json_str)
playurl = json_data['data']['playurl']
videourl = json_data['data']['playurl_video']
real_url = playurl if playurl else videourl
real_url = real_url.replace('\/', '/')
ksong_mid = json_data['data']['ksong_mid']
lyric_url = 'http://cgi.kg.qq.com/fcgi-bin/fcg_lyric?jsonpCallback=jsopgetlrcdata&outCharset=utf-8&ksongmid=' + ksong_mid
lyric_data = get_content(lyric_url)
lyric_string = lyric_data[len('jsopgetlrcdata('):-1]
lyric_json = json.loads(lyric_string)
lyric = lyric_json['data']['lyric']
title = match1(lyric, r'\[ti:([^\]]*)\]')
type, ext, size = url_info(real_url)
if not title:
title = shareid
print_info('腾讯全民K歌', title, type, size)
if not info_only:
download_urls([real_url], title, ext, size, output_dir, merge=False)
if caption:
caption_filename = title + '.lrc'
caption_path = output_dir + '/' + caption_filename
with open(caption_path, 'w') as f:
lrc_list = lyric.split('\r\n')
for line in lrc_list:
f.write(line)
f.write('\n')
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
""""""
if re.match(r'https?://(m\.)?egame.qq.com/', url):
from . import qq_egame
qq_egame.qq_egame_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
if 'kg.qq.com' in url or 'kg2.qq.com' in url:
shareid = url.split('?s=')[-1]
caption = kwargs['caption']
kg_qq_download_by_shareid(shareid, output_dir=output_dir, info_only=info_only, caption=caption)
return
if 'live.qq.com' in url:
if 'live.qq.com/video/v' in url:
qie_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
else:
qieDownload(url, output_dir=output_dir, merge=merge, info_only=info_only)
return
#do redirect
if 'v.qq.com/page' in url:
# for URLs like this:
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
content = get_html(url)
url = match1(content,r'window\.location\.href="(.*?)"')
if 'mp.weixin.qq.com/s' in url:
content = get_content(url, headers)
vids = matchall(content, [r'[?;]vid=(\w+)'])
for vid in vids:
qq_download_by_vid(vid, vid, output_dir, merge, info_only)
return
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
content = get_html(url)
if 'kuaibao.qq.com/s/' in url:
# https://kuaibao.qq.com/s/20180521V0Z9MH00
nid = match1(url, r'/s/([^/&?#]+)')
content = get_content('https://kuaibao.qq.com/getVideoRelate?id=' + nid)
info_json = json.loads(content)
vid=info_json['videoinfo']['vid']
title=info_json['videoinfo']['title']
elif 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
# http://daxue.qq.com/content/content/id/2321
content = get_content(url, headers)
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
title = match1(content, r'title">([^"]+)</p>')
title = title.strip() if title else vid
@ -92,17 +154,31 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
vid = match1(url, r'\bvid=(\w+)')
# for embedded URLs; don't know what the title is
title = vid
elif 'view.inews.qq.com' in url:
# view.inews.qq.com/a/20180521V0Z9MH00
content = get_content(url, headers)
vid = match1(content, r'"vid":"(\w+)"')
title = match1(content, r'"title":"(\w+)"')
else:
content = get_html(url)
vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
vid = vid[0] if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
content = get_content(url, headers)
#vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
rurl = match1(content, r'<link.*?rel\s*=\s*"canonical".*?href\s*="(.+?)".*?>') #https://v.qq.com/x/cover/9hpjiv5fhiyn86u/t0522x58xma.html
vid = ""
if rurl:
vid = rurl.split('/')[-1].split('.')[0]
# https://v.qq.com/x/page/d0552xbadkl.html https://y.qq.com/n/yqq/mv/v/g00268vlkzy.html
if vid == "undefined" or vid == "index":
vid = ""
vid = vid if vid else url.split('/')[-1].split('.')[0] #https://v.qq.com/x/cover/ps6mnfqyrfo7es3/q0181hpdvo5.html?
vid = vid if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
if not vid:
vid = match1(content, r'id"*\s*:\s*"(.+?)"')
title = match1(content,r'<a.*?id\s*=\s*"%s".*?title\s*=\s*"(.+?)".*?>'%vid)
title = match1(content, r'title">([^"]+)</p>') if not title else title
title = match1(content, r'"title":"([^"]+)"') if not title else title
title = vid if not title else title #general fallback
qq_download_by_vid(vid, title, output_dir, merge, info_only)
site_info = "QQ.com"

View File

@ -0,0 +1,44 @@
import re
import json
from ..common import *
from ..extractors import VideoExtractor
from ..util import log
from ..util.strings import unescape_html
__all__ = ['qq_egame_download']
def qq_egame_download(url,
output_dir='.',
merge=True,
info_only=False,
**kwargs):
uid = re.search('\d\d\d+', url)
an_url = "https://m.egame.qq.com/live?anchorid={}&".format(uid.group(0))
page = get_content(an_url)
server_data = re.search(r'window\.serverData\s*=\s*({.+?});', page)
if server_data is None:
log.wtf('Can not find window.server_data')
json_data = json.loads(server_data.group(1))
if json_data['anchorInfo']['data']['isLive'] == 0:
log.wtf('Offline...')
live_info = json_data['liveInfo']['data']
title = '{}_{}'.format(live_info['profileInfo']['nickName'],
live_info['videoInfo']['title'])
real_url = live_info['videoInfo']['streamInfos'][0]['playUrl']
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_url_ffmpeg(
real_url,
title,
'flv',
params={},
output_dir=output_dir,
merge=merge)
site_info = "egame.qq.com"
download = qq_egame_download
download_playlist = playlist_not_supported('qq_egame')

View File

@ -3,45 +3,50 @@
__all__ = ['sina_download', 'sina_download_by_vid', 'sina_download_by_vkey']
from ..common import *
from ..util.log import *
from hashlib import md5
from random import randint
from time import time
from xml.dom.minidom import parseString
import urllib.parse
def get_k(vid, rand):
t = str(int('{0:b}'.format(int(time()))[:-6], 2))
return md5((vid + 'Z6prk18aWxP278cVAH' + t + rand).encode('utf-8')).hexdigest()[:16] + t
def video_info_xml(vid):
def api_req(vid):
rand = "0.{0}{1}".format(randint(10000, 10000000), randint(10000, 10000000))
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, get_k(vid, rand))
xml = get_content(url, headers=fake_headers, decoded=True)
t = str(int('{0:b}'.format(int(time()))[:-6], 2))
k = md5((vid + 'Z6prk18aWxP278cVAH' + t + rand).encode('utf-8')).hexdigest()[:16] + t
url = 'http://ask.ivideo.sina.com.cn/v_play.php?vid={0}&ran={1}&p=i&k={2}'.format(vid, rand, k)
xml = get_content(url, headers=fake_headers)
return xml
def video_info(xml):
urls = re.findall(r'<url>(?:<!\[CDATA\[)?(.*?)(?:\]\]>)?</url>', xml)
name = match1(xml, r'<vname>(?:<!\[CDATA\[)?(.+?)(?:\]\]>)?</vname>')
vstr = match1(xml, r'<vstr>(?:<!\[CDATA\[)?(.+?)(?:\]\]>)?</vstr>')
return urls, name, vstr
video = parseString(xml).getElementsByTagName('video')[0]
result = video.getElementsByTagName('result')[0]
if result.firstChild.nodeValue == 'error':
message = video.getElementsByTagName('message')[0]
return None, message.firstChild.nodeValue, None
vname = video.getElementsByTagName('vname')[0].firstChild.nodeValue
durls = video.getElementsByTagName('durl')
urls = []
size = 0
for durl in durls:
url = durl.getElementsByTagName('url')[0].firstChild.nodeValue
seg_size = durl.getElementsByTagName('filesize')[0].firstChild.nodeValue
urls.append(url)
size += int(seg_size)
return urls, vname, size
def sina_download_by_vid(vid, title=None, output_dir='.', merge=True, info_only=False):
"""Downloads a Sina video by its unique vid.
http://video.sina.com.cn/
"""
xml = video_info_xml(vid)
sina_download_by_xml(xml, title, output_dir, merge, info_only)
def sina_download_by_xml(xml, title, output_dir, merge, info_only):
urls, name, vstr = video_info(xml)
title = title or name
assert title
size = 0
for url in urls:
_, _, temp = url_info(url)
size += temp
xml = api_req(vid)
urls, name, size = video_info(xml)
if urls is None:
log.wtf(name)
title = name
print_info(site_info, title, 'flv', size)
if not info_only:
download_urls(urls, title, 'flv', size, output_dir = output_dir, merge = merge)
@ -58,9 +63,40 @@ def sina_download_by_vkey(vkey, title=None, output_dir='.', merge=True, info_onl
if not info_only:
download_urls([url], title, 'flv', size, output_dir = output_dir, merge = merge)
def sina_zxt(url, output_dir='.', merge=True, info_only=False, **kwargs):
ep = 'http://s.video.sina.com.cn/video/play?video_id='
frag = urllib.parse.urlparse(url).fragment
if not frag:
log.wtf('No video specified with fragment')
meta = json.loads(get_content(ep + frag))
if meta['code'] != 1:
# Yes they use 1 for success.
log.wtf(meta['message'])
title = meta['data']['title']
videos = sorted(meta['data']['videos'], key = lambda i: int(i['size']))
if len(videos) == 0:
log.wtf('No video file returned by API server')
vid = videos[-1]['file_id']
container = videos[-1]['type']
size = int(videos[-1]['size'])
if container == 'hlv':
container = 'flv'
urls, _, _ = video_info(api_req(vid))
print_info(site_info, title, container, size)
if not info_only:
download_urls(urls, title, container, size, output_dir=output_dir, merge=merge, **kwargs)
return
def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
"""Downloads Sina videos by URL.
"""
if 'news.sina.com.cn/zxt' in url:
sina_zxt(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
vid = match1(url, r'vid=(\d+)')
if vid is None:
@ -73,10 +109,14 @@ def sina_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if vid is None:
vid = match1(video_page, r'vid:"?(\d+)"?')
if vid:
title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
sina_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
#title = match1(video_page, r'title\s*:\s*\'([^\']+)\'')
sina_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
else:
vkey = match1(video_page, r'vkey\s*:\s*"([^"]+)"')
if vkey is None:
vid = match1(url, r'#(\d+)')
sina_download_by_vid(vid, output_dir=output_dir, merge=merge, info_only=info_only)
return
title = match1(video_page, r'title\s*:\s*"([^"]+)"')
sina_download_by_vkey(vkey, title=title, output_dir=output_dir, merge=merge, info_only=info_only)

View File

@ -15,22 +15,24 @@ Changelog:
new api
'''
def real_url(host,vid,tvid,new,clipURL,ck):
url = 'http://'+host+'/?prot=9&prod=flash&pt=1&file='+clipURL+'&new='+new +'&key='+ ck+'&vid='+str(vid)+'&uid='+str(int(time.time()*1000))+'&t='+str(random())+'&rb=1'
return json.loads(get_html(url))['url']
def real_url(fileName, key, ch):
url = "https://data.vod.itc.cn/ip?new=" + fileName + "&num=1&key=" + key + "&ch=" + ch + "&pt=1&pg=2&prod=h5n"
return json.loads(get_html(url))['servers'][0]['url']
def sohu_download(url, output_dir='.', merge=True, info_only=False, extractor_proxy=None, **kwargs):
if re.match(r'http://share.vrs.sohu.com', url):
vid = r1('id=(\d+)', url)
else:
html = get_html(url)
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html)
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html) or r1(r'bid:\'(\d+)\',', html) or r1(r'bid=(\d+)', html)
assert vid
if re.match(r'http://tv.sohu.com/', url):
if extractor_proxy:
set_proxy(tuple(extractor_proxy.split(":")))
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
if info and info.get("data", ""):
for qtyp in ["oriVid", "superVid", "highVid", "norVid", "relativeId"]:
if 'data' in info:
hqvid = info['data'][qtyp]
@ -51,9 +53,8 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
title = data['tvName']
size = sum(data['clipsBytes'])
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
clipURL = urlparse(clip).path
urls.append(real_url(host,hqvid,tvid,new,clipURL,ck))
for fileName, key in zip(data['su'], data['ck']):
urls.append(real_url(fileName, key, data['ch']))
# assert data['clipsURL'][0].endswith('.mp4')
else:
@ -66,14 +67,14 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
title = data['tvName']
size = sum(map(int, data['clipsBytes']))
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
clipURL = urlparse(clip).path
urls.append(real_url(host,vid,tvid,new,clipURL,ck))
for fileName, key in zip(data['su'], data['ck']):
urls.append(real_url(fileName, key, data['ch']))
print_info(site_info, title, 'mp4', size)
if not info_only:
download_urls(urls, title, 'mp4', size, output_dir, refer=url, merge=merge)
site_info = "Sohu.com"
download = sohu_download
download_playlist = playlist_not_supported('sohu')

View File

@ -1,31 +1,80 @@
#!/usr/bin/env python
__all__ = ['soundcloud_download', 'soundcloud_download_by_id']
__all__ = ['sndcd_download']
from ..common import *
def soundcloud_download_by_id(id, title = None, output_dir = '.', merge = True, info_only = False):
assert title
#if info["downloadable"]:
# url = 'https://api.soundcloud.com/tracks/' + id + '/download?client_id=b45b1aa10f1ac2941910a7f0d10f8e28'
url = 'https://api.soundcloud.com/tracks/' + id + '/stream?client_id=02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
assert url
type, ext, size = url_info(url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge = merge)
def soundcloud_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
metadata = get_html('https://api.soundcloud.com/resolve.json?url=' + url + '&client_id=02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea')
import re
import json
info = json.loads(metadata)
title = info["title"]
id = str(info["id"])
import urllib.error
def get_sndcd_apikey():
home_page = get_content('https://soundcloud.com')
js_url = re.findall(r'script crossorigin src="(.+?)"></script>', home_page)[-1]
client_id = get_content(js_url)
return re.search(r'client_id:"(.+?)"', client_id).group(1)
def get_resource_info(resource_url, client_id):
cont = get_content(resource_url, decoded=True)
x = re.escape('forEach(function(e){n(e)})}catch(t){}})},')
x = re.search(r'' + x + r'(.*)\);</script>', cont)
info = json.loads(x.group(1))[-1]['data'][0]
info = info['tracks'] if info.get('track_count') else [info]
ids = [i['id'] for i in info if i.get('comment_count') is None]
ids = list(map(str, ids))
ids_split = ['%2C'.join(ids[i:i+10]) for i in range(0, len(ids), 10)]
api_url = 'https://api-v2.soundcloud.com/tracks?ids={ids}&client_id={client_id}&%5Bobject%20Object%5D=&app_version=1584348206&app_locale=en'
res = []
for ids in ids_split:
uri = api_url.format(ids=ids, client_id=client_id)
cont = get_content(uri, decoded=True)
res += json.loads(cont)
res = iter(res)
info = [next(res) if i.get('comment_count') is None else i for i in info]
return info
def sndcd_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
client_id = get_sndcd_apikey()
r_info = get_resource_info(url, client_id)
for info in r_info:
title = info['title']
metadata = info.get('publisher_metadata')
transcodings = info['media']['transcodings']
sq = [i for i in transcodings if i['quality'] == 'sq']
hq = [i for i in transcodings if i['quality'] == 'hq']
# source url
surl = sq[0] if hq == [] else hq[0]
surl = surl['url']
uri = surl + '?client_id=' + client_id
r = get_content(uri)
surl = json.loads(r)['url']
m3u8 = get_content(surl)
# url list
urll = re.findall(r'http.*?(?=\n)', m3u8)
size = urls_size(urll)
print_info(site_info, title, 'audio/mpeg', size)
print(end='', flush=True)
if not info_only:
download_urls(urll, title=title, ext='mp3', total_size=size, output_dir=output_dir, merge=True)
soundcloud_download_by_id(id, title, output_dir, merge = merge, info_only = info_only)
site_info = "SoundCloud.com"
download = soundcloud_download
download_playlist = playlist_not_supported('soundcloud')
download = sndcd_download
download_playlist = sndcd_download

View File

@ -7,9 +7,10 @@ import json
def ted_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
metadata = json.loads(match1(html, r'({"talks"(.*)})\)'))
patt = r'"__INITIAL_DATA__"\s*:\s*\{(.+)\}'
metadata = json.loads('{' + match1(html, patt) + '}')
title = metadata['talks'][0]['title']
nativeDownloads = metadata['talks'][0]['nativeDownloads']
nativeDownloads = metadata['talks'][0]['downloads']['nativeDownloads']
for quality in ['high', 'medium', 'low']:
if quality in nativeDownloads:
url = nativeDownloads[quality]

View File

@ -1,83 +0,0 @@
#!/usr/bin/env python
__all__ = ['thvideo_download']
from ..common import *
from xml.dom.minidom import parseString
#----------------------------------------------------------------------
def thvideo_cid_to_url(cid, p):
"""int,int->list
From Biligrab."""
interface_url = 'http://thvideo.tv/api/playurl.php?cid={cid}-{p}'.format(cid = cid, p = p)
data = get_content(interface_url)
rawurl = []
dom = parseString(data)
for node in dom.getElementsByTagName('durl'):
url = node.getElementsByTagName('url')[0]
rawurl.append(url.childNodes[0].data)
return rawurl
#----------------------------------------------------------------------
def th_video_get_title(url, p):
""""""
if re.match(r'http://thvideo.tv/v/\w+', url):
html = get_content(url)
title = match1(html, r'<meta property="og:title" content="([^"]*)"').strip()
video_list = match1(html, r'<li>cid=(.+)</li>').split('**')
if int(p) > 0: #not the 1st P or multi part
title = title + ' - ' + [i.split('=')[-1:][0].split('|')[1] for i in video_list][p]
return title
#----------------------------------------------------------------------
def thvideo_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
if re.match(r'http://thvideo.tv/v/\w+', url):
if 'p' in kwargs and kwargs['p']:
p = kwargs['p']
else:
p = int(match1(url, r'http://thvideo.tv/v/th\d+#(\d+)'))
p -= 1
if not p or p < 0:
p = 0
if 'title' in kwargs and kwargs['title']:
title = kwargs['title']
else:
title = th_video_get_title(url, p)
cid = match1(url, r'http://thvideo.tv/v/th(\d+)')
type_ = ''
size = 0
urls = thvideo_cid_to_url(cid, p)
for url in urls:
_, type_, temp = url_info(url)
size += temp
print_info(site_info, title, type_, size)
if not info_only:
download_urls(urls, title, type_, total_size=None, output_dir=output_dir, merge=merge)
#----------------------------------------------------------------------
def thvideo_download_playlist(url, output_dir = '.', merge = False, info_only = False, **kwargs):
""""""
if re.match(r'http://thvideo.tv/v/\w+', url):
html = get_content(url)
video_list = match1(html, r'<li>cid=(.+)</li>').split('**')
title_base = th_video_get_title(url, 0)
for p, v in video_list:
part_title = [i.split('=')[-1:][0].split('|')[1] for i in video_list][p]
title = title_base + part_title
thvideo_download(url, output_dir, merge,
info_only, p = p, title = title)
site_info = "THVideo"
download = thvideo_download
download_playlist = thvideo_download_playlist

View File

@ -0,0 +1,47 @@
#!/usr/bin/env python
__all__ = ['tiktok_download']
from ..common import *
def tiktok_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
referUrl = url.split('?')[0]
headers = fake_headers
# trick or treat
html = get_content(url, headers=headers)
data = r1(r'<script id="__NEXT_DATA__".*?>(.*?)</script>', html)
info = json.loads(data)
wid = info['props']['initialProps']['$wid']
cookie = 'tt_webid=%s; tt_webid_v2=%s' % (wid, wid)
# here's the cookie
headers['Cookie'] = cookie
# try again
html = get_content(url, headers=headers)
data = r1(r'<script id="__NEXT_DATA__".*?>(.*?)</script>', html)
info = json.loads(data)
wid = info['props']['initialProps']['$wid']
cookie = 'tt_webid=%s; tt_webid_v2=%s' % (wid, wid)
videoData = info['props']['pageProps']['itemInfo']['itemStruct']
videoId = videoData['id']
videoUrl = videoData['video']['downloadAddr']
uniqueId = videoData['author'].get('uniqueId')
nickName = videoData['author'].get('nickname')
title = '%s [%s]' % (nickName or uniqueId, videoId)
# we also need the referer
headers['Referer'] = referUrl
mime, ext, size = url_info(videoUrl, headers=headers)
print_info(site_info, title, mime, size)
if not info_only:
download_urls([videoUrl], title, ext, size, output_dir=output_dir, merge=merge, headers=headers)
site_info = "TikTok.com"
download = tiktok_download
download_playlist = playlist_not_supported('tiktok')

View File

@ -0,0 +1,86 @@
#!/usr/bin/env python
import binascii
import random
from json import loads
from urllib.parse import urlparse
from ..common import *
try:
from base64 import decodebytes
except ImportError:
from base64 import decodestring
decodebytes = decodestring
__all__ = ['toutiao_download', ]
def random_with_n_digits(n):
return random.randint(10 ** (n - 1), (10 ** n) - 1)
def sign_video_url(vid):
r = str(random_with_n_digits(16))
url = 'https://ib.365yg.com/video/urls/v/1/toutiao/mp4/{vid}'.format(vid=vid)
n = urlparse(url).path + '?r=' + r
b_n = bytes(n, encoding="utf-8")
s = binascii.crc32(b_n)
aid = 1364
ts = int(time.time() * 1000)
return url + '?r={r}&s={s}&aid={aid}&vfrom=xgplayer&callback=axiosJsonpCallback1&_={ts}'.format(r=r, s=s, aid=aid,
ts=ts)
class ToutiaoVideoInfo(object):
def __init__(self):
self.bitrate = None
self.definition = None
self.size = None
self.height = None
self.width = None
self.type = None
self.url = None
def __str__(self):
return json.dumps(self.__dict__)
def get_file_by_vid(video_id):
vRet = []
url = sign_video_url(video_id)
ret = get_content(url)
ret = loads(ret[20:-1])
vlist = ret.get('data').get('video_list')
if len(vlist) > 0:
vInfo = vlist.get(sorted(vlist.keys(), reverse=True)[0])
vUrl = vInfo.get('main_url')
vUrl = decodebytes(vUrl.encode('ascii')).decode('ascii')
videoInfo = ToutiaoVideoInfo()
videoInfo.bitrate = vInfo.get('bitrate')
videoInfo.definition = vInfo.get('definition')
videoInfo.size = vInfo.get('size')
videoInfo.height = vInfo.get('vheight')
videoInfo.width = vInfo.get('vwidth')
videoInfo.type = vInfo.get('vtype')
videoInfo.url = vUrl
vRet.append(videoInfo)
return vRet
def toutiao_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url, faker=True)
video_id = match1(html, r".*?videoId: '(?P<vid>.*)'")
title = match1(html, '.*?<title>(?P<title>.*?)</title>')
video_file_list = get_file_by_vid(video_id) # 调api获取视频源文件
type, ext, size = url_info(video_file_list[0].url, faker=True)
print_info(site_info=site_info, title=title, type=type, size=size)
if not info_only:
download_urls([video_file_list[0].url], title, ext, size, output_dir, merge=merge, faker=True)
site_info = "Toutiao.com"
download = toutiao_download
download_playlist = playlist_not_supported("toutiao")

View File

@ -26,7 +26,10 @@ def tudou_download_by_id(id, title, output_dir = '.', merge = True, info_only =
html = get_html('http://www.tudou.com/programs/view/%s/' % id)
iid = r1(r'iid\s*[:=]\s*(\S+)', html)
try:
title = r1(r'kw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
except AttributeError:
title = ''
tudou_download_by_iid(iid, title, output_dir = output_dir, merge = merge, info_only = info_only)
def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
@ -42,16 +45,23 @@ def tudou_download(url, output_dir = '.', merge = True, info_only = False, **kwa
if id:
return tudou_download_by_id(id, title="", info_only=info_only)
html = get_decoded_html(url)
html = get_content(url)
title = r1(r'kw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
try:
title = r1(r'\Wkw\s*[:=]\s*[\'\"]([^\n]+?)\'\s*\n', html).replace("\\'", "\'")
assert title
title = unescape_html(title)
except AttributeError:
title = match1(html, r'id=\"subtitle\"\s*title\s*=\s*\"([^\"]+)\"')
if title is None:
title = ''
vcode = r1(r'vcode\s*[:=]\s*\'([^\']+)\'', html)
if vcode is None:
vcode = match1(html, r'viden\s*[:=]\s*\"([\w+/=]+)\"')
if vcode:
from .youku import youku_download_by_vid
return youku_download_by_vid(vcode, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return youku_download_by_vid(vcode, title=title, output_dir=output_dir, merge=merge, info_only=info_only, src='tudou', **kwargs)
iid = r1(r'iid\s*[:=]\s*(\d+)', html)
if not iid:

View File

@ -13,7 +13,29 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
universal_download(url, output_dir, merge=merge, info_only=info_only)
return
html = parse.unquote(get_html(url)).replace('\/', '/')
import ssl
ssl_context = request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
cookie_handler = request.HTTPCookieProcessor()
opener = request.build_opener(ssl_context, cookie_handler)
request.install_opener(opener)
page = get_html(url)
form_key = match1(page, r'id="tumblr_form_key" content="([^"]+)"')
if form_key is not None:
# bypass GDPR consent page
referer = 'https://www.tumblr.com/privacy/consent?redirect=%s' % parse.quote_plus(url)
post_content('https://www.tumblr.com/svc/privacy/consent',
headers={
'Content-Type': 'application/json',
'User-Agent': fake_headers['User-Agent'],
'Referer': referer,
'X-tumblr-form-key': form_key,
'X-Requested-With': 'XMLHttpRequest'
},
post_data_raw='{"eu_resident":true,"gdpr_is_acceptable_age":true,"gdpr_consent_core":true,"gdpr_consent_first_party_ads":true,"gdpr_consent_third_party_ads":true,"gdpr_consent_search_history":true,"redirect_to":"%s","gdpr_reconsent":false}' % url)
page = get_html(url, faker=True)
html = parse.unquote(page).replace('\/', '/')
feed = r1(r'<meta property="og:type" content="tumblr-feed:(\w+)" />', html)
if feed in ['photo', 'photoset', 'entry'] or feed is None:
@ -21,26 +43,36 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
page_title = r1(r'<meta name="description" content="([^"\n]+)', html) or \
r1(r'<meta property="og:description" content="([^"\n]+)', html) or \
r1(r'<title>([^<\n]*)', html)
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.jpg)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.png)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^";]+_\d+\.gif)', html)
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.jpg)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.png)', html) +\
re.findall(r'(https?://[^;"&]+/tumblr_[^";&]+_\d+\.gif)', html)
tuggles = {}
for url in urls:
filename = parse.unquote(url.split('/')[-1])
if url.endswith('.gif'):
hd_url = url
elif url.endswith('.jpg'):
hd_url = r1(r'(.+)_\d+\.jpg$', url) + '_1280.jpg' # FIXME: decide actual quality
elif url.endswith('.png'):
hd_url = r1(r'(.+)_\d+\.png$', url) + '_1280.png' # FIXME: decide actual quality
else:
continue
filename = parse.unquote(hd_url.split('/')[-1])
title = '.'.join(filename.split('.')[:-1])
tumblr_id = r1(r'^tumblr_(.+)_\d+$', title)
quality = int(r1(r'^tumblr_.+_(\d+)$', title))
ext = filename.split('.')[-1]
size = int(get_head(url)['Content-Length'])
try:
size = int(get_head(hd_url)['Content-Length'])
if tumblr_id not in tuggles or tuggles[tumblr_id]['quality'] < quality:
tuggles[tumblr_id] = {
'title': title,
'url': url,
'url': hd_url,
'quality': quality,
'ext': ext,
'size': size,
}
except: pass
if tuggles:
size = sum([tuggles[t]['size'] for t in tuggles])
@ -68,6 +100,11 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
real_url = r1(r'<source src="([^"]*)"', html)
if not real_url:
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
if iframe_url is None:
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
return
if iframe_url:
iframe_html = get_content(iframe_url, headers=fake_headers)
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
@ -92,7 +129,11 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
r1(r'<meta property="og:description" content="([^"]*)" />', html) or
r1(r'<title>([^<\n]*)', html) or url.split("/")[4]).replace('\n', '')
type, ext, size = url_info(real_url)
# this is better
vcode = r1(r'tumblr_(\w+)', real_url)
real_url = 'https://vt.media.tumblr.com/tumblr_%s.mp4' % vcode
type, ext, size = url_info(real_url, faker=True)
print_info(site_info, title, type, size)
if not info_only:

View File

@ -3,87 +3,102 @@
__all__ = ['twitter_download']
from ..common import *
from .universal import *
from .vine import vine_download
def extract_m3u(source):
r1 = get_content(source)
s1 = re.findall(r'(/ext_tw_video/.*)', r1)
s1 += re.findall(r'(/amplify_video/.*)', r1)
r2 = get_content('https://video.twimg.com%s' % s1[-1])
s2 = re.findall(r'(/ext_tw_video/.*)', r2)
s2 += re.findall(r'(/amplify_video/.*)', r2)
return ['https://video.twimg.com%s' % i for i in s2]
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
if re.match(r'https?://pbs\.twimg\.com', url):
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
return
if re.match(r'https?://mobile', url): # normalize mobile URL
url = 'https://' + match1(url, r'//mobile\.(.+)')
if re.match(r'https?://twitter\.com/i/moments/', url): # moments
html = get_html(url, faker=True)
paths = re.findall(r'data-permalink-path="([^"]+)"', html)
for path in paths:
twitter_download('https://twitter.com' + path,
output_dir=output_dir,
merge=merge,
info_only=info_only,
**kwargs)
return
html = get_html(url, faker=False) # disable faker to prevent 302 infinite redirect
screen_name = r1(r'twitter\.com/([^/]+)', url) or r1(r'data-screen-name="([^"]*)"', html) or \
r1(r'<meta name="twitter:title" content="([^"]*)"', html)
item_id = r1(r'data-item-id="([^"]*)"', html) or \
item_id = r1(r'twitter\.com/[^/]+/status/(\d+)', url) or r1(r'data-item-id="([^"]*)"', html) or \
r1(r'<meta name="twitter:site:id" content="([^"]*)"', html)
page_title = "{} [{}]".format(screen_name, item_id)
try: # extract images
urls = re.findall(r'property="og:image"\s*content="([^"]+:large)"', html)
assert urls
images = []
for url in urls:
url = ':'.join(url.split(':')[:-1]) + ':orig'
filename = parse.unquote(url.split('/')[-1])
title = '.'.join(filename.split('.')[:-1])
ext = url.split(':')[-2].split('.')[-1]
size = int(get_head(url)['Content-Length'])
images.append({'title': title,
'url': url,
'ext': ext,
'size': size})
size = sum([image['size'] for image in images])
print_info(site_info, page_title, images[0]['ext'], size)
authorization = 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
if not info_only:
for image in images:
title = image['title']
ext = image['ext']
size = image['size']
url = image['url']
print_info(site_info, title, ext, size)
download_urls([url], title, ext, size,
output_dir=output_dir)
ga_url = 'https://api.twitter.com/1.1/guest/activate.json'
ga_content = post_content(ga_url, headers={'authorization': authorization})
guest_token = json.loads(ga_content)['guest_token']
except: # extract video
# always use i/cards or videos url
if not re.match(r'https?://twitter.com/i/', url):
url = r1(r'<meta\s*property="og:video:url"\s*content="([^"]+)"', html)
if not url:
url = 'https://twitter.com/i/videos/%s' % item_id
html = get_content(url)
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
data_config = r1(r'data-config="([^"]*)"', html) or \
r1(r'data-player-config="([^"]*)"', html)
i = json.loads(unescape_html(data_config))
if 'video_url' in i:
source = i['video_url']
if not item_id: page_title = i['tweet_id']
elif 'playlist' in i:
source = i['playlist'][0]['source']
if not item_id: page_title = i['playlist'][0]['contentId']
elif 'vmap_url' in i:
vmap_url = i['vmap_url']
vmap = get_content(vmap_url)
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
if not item_id: page_title = i['tweet_id']
elif 'scribe_playlist_url' in i:
scribe_playlist_url = i['scribe_playlist_url']
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
info = json.loads(api_content)
if 'extended_entities' in info['globalObjects']['tweets'][item_id]:
# if the tweet contains media, download them
media = info['globalObjects']['tweets'][item_id]['extended_entities']['media']
try:
urls = extract_m3u(source)
except:
urls = [source]
elif info['globalObjects']['tweets'][item_id].get('is_quote_status') == True:
# if the tweet does not contain media, but it quotes a tweet
# and the quoted tweet contains media, download them
item_id = info['globalObjects']['tweets'][item_id]['quoted_status_id_str']
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
info = json.loads(api_content)
if 'extended_entities' in info['globalObjects']['tweets'][item_id]:
media = info['globalObjects']['tweets'][item_id]['extended_entities']['media']
else:
# quoted tweet has no media
return
else:
# no media, no quoted tweet
return
for medium in media:
if 'video_info' in medium:
# FIXME: we're assuming one tweet only contains one video here
variants = medium['video_info']['variants']
variants = sorted(variants, key=lambda kv: kv.get('bitrate', 0))
urls = [ variants[-1]['url'] ]
size = urls_size(urls)
mime, ext = 'video/mp4', 'mp4'
mime, ext = variants[-1]['content_type'], 'mp4'
print_info(site_info, page_title, mime, size)
if not info_only:
download_urls(urls, page_title, ext, size, output_dir, merge=merge)
else:
title = item_id + '_' + medium['media_url_https'].split('.')[-2].split('/')[-1]
urls = [ medium['media_url_https'] + ':orig' ]
size = urls_size(urls)
ext = medium['media_url_https'].split('.')[-1]
print_info(site_info, title, ext, size)
if not info_only:
download_urls(urls, title, ext, size, output_dir, merge=merge)
site_info = "Twitter.com"
download = twitter_download
download_playlist = playlist_not_supported('twitter')

View File

@ -0,0 +1,137 @@
#!/usr/bin/env python
__all__ = ['ucas_download', 'ucas_download_single', 'ucas_download_playlist']
from ..common import *
import urllib.error
import http.client
from time import time
from random import random
import xml.etree.ElementTree as ET
from copy import copy
"""
Do not replace http.client with get_content
for UCAS's server is not correctly returning data!
"""
def dictify(r,root=True):
"""http://stackoverflow.com/a/30923963/2946714"""
if root:
return {r.tag : dictify(r, False)}
d=copy(r.attrib)
if r.text:
d["_text"]=r.text
for x in r.findall("./*"):
if x.tag not in d:
d[x.tag]=[]
d[x.tag].append(dictify(x,False))
return d
def _get_video_query_url(resourceID):
# has to be like this
headers = {
'DNT': '1',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.47 Safari/537.36',
'Accept': '*/*',
'Referer': 'http://v.ucas.ac.cn/',
'Connection': 'keep-alive',
}
conn = http.client.HTTPConnection("210.76.211.10")
conn.request("GET", "/vplus/remote.do?method=query2&loginname=videocas&pwd=af1c7a4c5f77f790722f7cae474c37e281203765d423a23b&resource=%5B%7B%22resourceID%22%3A%22" + resourceID + "%22%2C%22on%22%3A1%2C%22time%22%3A600%2C%22eid%22%3A100%2C%22w%22%3A800%2C%22h%22%3A600%7D%5D&timeStamp=" + str(int(time())), headers=headers)
res = conn.getresponse()
data = res.read()
info = data.decode("utf-8")
return match1(info, r'video":"(.+)"')
def _get_virtualPath(video_query_url):
#getResourceJsCode2
html = get_content(video_query_url)
return match1(html, r"function\s+getVirtualPath\(\)\s+{\s+return\s+'(\w+)'")
def _get_video_list(resourceID):
""""""
conn = http.client.HTTPConnection("210.76.211.10")
conn.request("GET", '/vplus/member/resource.do?isyulan=0&method=queryFlashXmlByResourceId&resourceId={resourceID}&randoms={randoms}'.format(resourceID = resourceID,
randoms = random()))
res = conn.getresponse()
data = res.read()
video_xml = data.decode("utf-8")
root = ET.fromstring(video_xml.split('___!!!___')[0])
r = dictify(root)
huge_list = []
# main
huge_list.append([i['value'] for i in sorted(r['video']['mainUrl'][0]['_flv'][0]['part'][0]['video'], key=lambda k: int(k['index']))])
# sub
if '_flv' in r['video']['subUrl'][0]:
huge_list.append([i['value'] for i in sorted(r['video']['subUrl'][0]['_flv'][0]['part'][0]['video'], key=lambda k: int(k['index']))])
return huge_list
def _ucas_get_url_lists_by_resourceID(resourceID):
video_query_url = _get_video_query_url(resourceID)
assert video_query_url != '', 'Cannot find video GUID!'
virtualPath = _get_virtualPath(video_query_url)
assert virtualPath != '', 'Cannot find virtualPath!'
url_lists = _get_video_list(resourceID)
assert url_lists, 'Cannot find any URL to download!'
# make real url
# credit to a mate in UCAS
for video_type_id, video_urls in enumerate(url_lists):
for k, path in enumerate(video_urls):
url_lists[video_type_id][k] = 'http://210.76.211.10/vplus/member/resource.do?virtualPath={virtualPath}&method=getImgByStream&imgPath={path}'.format(virtualPath = virtualPath,
path = path)
return url_lists
def ucas_download_single(url, output_dir = '.', merge = False, info_only = False, **kwargs):
'''video page'''
html = get_content(url)
# resourceID is UUID
resourceID = re.findall( r'resourceID":"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})', html)[0]
assert resourceID != '', 'Cannot find resourceID!'
title = match1(html, r'<div class="bc-h">(.+)</div>')
url_lists = _ucas_get_url_lists_by_resourceID(resourceID)
assert url_lists, 'Cannot find any URL of such class!'
for k, part in enumerate(url_lists):
part_title = title + '_' + str(k)
print_info(site_info, part_title, 'flv', 0)
if not info_only:
download_urls(part, part_title, 'flv', total_size=None, output_dir=output_dir, merge=merge)
def ucas_download_playlist(url, output_dir = '.', merge = False, info_only = False, **kwargs):
'''course page'''
html = get_content(url)
parts = re.findall( r'(getplaytitle.do\?.+)"', html)
assert parts, 'No part found!'
for part_path in parts:
ucas_download('http://v.ucas.ac.cn/course/' + part_path, output_dir=output_dir, merge=merge, info_only=info_only)
def ucas_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
if 'classid=' in url and 'getplaytitle.do' in url:
ucas_download_single(url, output_dir=output_dir, merge=merge, info_only=info_only)
elif 'CourseIndex.do' in url:
ucas_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
site_info = "UCAS"
download = ucas_download
download_playlist = ucas_download_playlist

View File

@ -6,12 +6,17 @@ from ..common import *
from .embed import *
def universal_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
try:
content_type = get_head(url, headers=fake_headers)['Content-Type']
except:
content_type = get_head(url, headers=fake_headers, get_method='GET')['Content-Type']
if content_type.startswith('text/html'):
try:
embed_download(url, output_dir, merge=merge, info_only=info_only)
except: pass
else: return
embed_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
except Exception:
pass
else:
return
domains = url.split('/')[2].split('.')
if len(domains) > 2: domains = domains[1:]
@ -26,6 +31,38 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
if page_title:
page_title = unescape_html(page_title)
meta_videos = re.findall(r'<meta property="og:video:url" content="([^"]*)"', page)
if meta_videos:
try:
for meta_video in meta_videos:
meta_video_url = unescape_html(meta_video)
type_, ext, size = url_info(meta_video_url)
print_info(site_info, page_title, type_, size)
if not info_only:
download_urls([meta_video_url], page_title,
ext, size,
output_dir=output_dir, merge=merge,
faker=True)
except:
pass
else:
return
hls_urls = re.findall(r'(https?://[^;"\'\\]+' + '\.m3u8?' +
r'[^;"\'\\]*)', page)
if hls_urls:
try:
for hls_url in hls_urls:
type_, ext, size = url_info(hls_url)
print_info(site_info, page_title, type_, size)
if not info_only:
download_url_ffmpeg(url=hls_url, title=page_title,
ext='mp4', output_dir=output_dir)
except:
pass
else:
return
# most common media file extensions on the Internet
media_exts = ['\.flv', '\.mp3', '\.mp4', '\.webm',
'[-_]1\d\d\d\.jpe?g', '[-_][6-9]\d\d\.jpe?g', # tumblr
@ -38,18 +75,49 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
urls = []
for i in media_exts:
urls += re.findall(r'(https?://[^;"\'\\]+' + i + r'[^;"\'\\]*)', page)
urls += re.findall(r'(https?://[^ ;&"\'\\<>]+' + i + r'[^ ;&"\'\\<>]*)', page)
p_urls = re.findall(r'(https?%3A%2F%2F[^;&]+' + i + r'[^;&]*)', page)
p_urls = re.findall(r'(https?%3A%2F%2F[^;&"]+' + i + r'[^;&"]*)', page)
urls += [parse.unquote(url) for url in p_urls]
q_urls = re.findall(r'(https?:\\\\/\\\\/[^;"\']+' + i + r'[^;"\']*)', page)
q_urls = re.findall(r'(https?:\\\\/\\\\/[^ ;"\'<>]+' + i + r'[^ ;"\'<>]*)', page)
urls += [url.replace('\\\\/', '/') for url in q_urls]
# a link href to an image is often an interesting one
urls += re.findall(r'href="(https?://[^"]+\.jpe?g)"', page)
urls += re.findall(r'href="(https?://[^"]+\.png)"', page)
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page)
urls += re.findall(r'href="(https?://[^"]+\.jpe?g)"', page, re.I)
urls += re.findall(r'href="(https?://[^"]+\.png)"', page, re.I)
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page, re.I)
# <img> with high widths
urls += re.findall(r'<img src="([^"]*)"[^>]*width="\d\d\d+"', page, re.I)
# relative path
rel_urls = []
rel_urls += re.findall(r'href="(\.[^"]+\.jpe?g)"', page, re.I)
rel_urls += re.findall(r'href="(\.[^"]+\.png)"', page, re.I)
rel_urls += re.findall(r'href="(\.[^"]+\.gif)"', page, re.I)
for rel_url in rel_urls:
urls += [ r1(r'(.*/)', url) + rel_url ]
# site-relative path
rel_urls = []
rel_urls += re.findall(r'href="(/[^"]+\.jpe?g)"', page, re.I)
rel_urls += re.findall(r'href="(/[^"]+\.png)"', page, re.I)
rel_urls += re.findall(r'href="(/[^"]+\.gif)"', page, re.I)
for rel_url in rel_urls:
urls += [ r1(r'(https?://[^/]+)', url) + rel_url ]
# sometimes naive
urls += re.findall(r'data-original="(https?://[^"]+\.jpe?g)"', page, re.I)
urls += re.findall(r'data-original="(https?://[^"]+\.png)"', page, re.I)
urls += re.findall(r'data-original="(https?://[^"]+\.gif)"', page, re.I)
# MPEG-DASH MPD
mpd_urls = re.findall(r'src="(https?://[^"]+\.mpd)"', page)
for mpd_url in mpd_urls:
cont = get_content(mpd_url)
base_url = r1(r'<BaseURL>(.*)</BaseURL>', cont)
urls += [ r1(r'(.*/)[^/]*', mpd_url) + base_url ]
# have some candy!
candies = []
@ -57,23 +125,35 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
for url in set(urls):
filename = parse.unquote(url.split('/')[-1])
if 5 <= len(filename) <= 80:
title = '.'.join(filename.split('.')[:-1])
title = '.'.join(filename.split('.')[:-1]) or filename
else:
title = '%s' % i
i += 1
if r1(r'(https://pinterest.com/pin/)', url):
continue
candies.append({'url': url,
'title': title})
for candy in candies:
try:
try:
mime, ext, size = url_info(candy['url'], faker=False)
assert size
except:
mime, ext, size = url_info(candy['url'], faker=True)
if not size: size = float('Int')
if not size: size = float('Inf')
except:
continue
else:
print_info(site_info, candy['title'], ext, size)
if not info_only:
try:
download_urls([candy['url']], candy['title'], ext, size,
output_dir=output_dir, merge=merge,
faker=False)
except:
download_urls([candy['url']], candy['title'], ext, size,
output_dir=output_dir, merge=merge,
faker=True)
@ -81,10 +161,10 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
else:
# direct download
filename = parse.unquote(url.split('/')[-1])
title = '.'.join(filename.split('.')[:-1])
ext = filename.split('.')[-1]
_, _, size = url_info(url, faker=True)
url_trunk = url.split('?')[0] # strip query string
filename = parse.unquote(url_trunk.split('/')[-1]) or parse.unquote(url_trunk.split('/')[-2])
title = '.'.join(filename.split('.')[:-1]) or filename
_, ext, size = url_info(url, faker=True)
print_info(site_info, title, ext, size)
if not info_only:
download_urls([url], title, ext, size,

View File

@ -1,44 +0,0 @@
#!/usr/bin/env python
__all__ = ['videomega_download']
from ..common import *
import ssl
def videomega_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
# Hot-plug cookie handler
ssl_context = request.HTTPSHandler(
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
cookie_handler = request.HTTPCookieProcessor()
opener = request.build_opener(ssl_context, cookie_handler)
opener.addheaders = [('Referer', url),
('Cookie', 'noadvtday=0')]
request.install_opener(opener)
if re.search(r'view\.php', url):
php_url = url
else:
content = get_content(url)
m = re.search(r'ref="([^"]*)";\s*width="([^"]*)";\s*height="([^"]*)"', content)
ref = m.group(1)
width, height = m.group(2), m.group(3)
php_url = 'http://videomega.tv/view.php?ref=%s&width=%s&height=%s' % (ref, width, height)
content = get_content(php_url)
title = match1(content, r'<title>(.*)</title>')
js = match1(content, r'(eval.*)')
t = match1(js, r'\$\("\w+"\)\.\w+\("\w+","([^"]+)"\)')
t = re.sub(r'(\w)', r'{\1}', t)
t = t.translate({87 + i: str(i) for i in range(10, 36)})
s = match1(js, r"'([^']+)'\.split").split('|')
src = t.format(*s)
type, ext, size = url_info(src, faker=True)
print_info(site_info, title, type, size)
if not info_only:
download_urls([src], title, ext, size, output_dir, merge=merge, faker=True)
site_info = "Videomega.tv"
download = videomega_download
download_playlist = playlist_not_supported('videomega')

View File

@ -1,40 +0,0 @@
#!/usr/bin/env python
__all__ = ['vidto_download']
from ..common import *
import pdb
import time
def vidto_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_content(url)
params = {}
r = re.findall(
r'type="(?:hidden|submit)?"(?:.*?)name="(.+?)"\s* value="?(.+?)">', html)
for name, value in r:
params[name] = value
data = parse.urlencode(params).encode('utf-8')
req = request.Request(url)
print("Please wait for 6 seconds...")
time.sleep(6)
print("Starting")
new_html = request.urlopen(req, data).read().decode('utf-8', 'replace')
new_stff = re.search('lnk_download" href="(.*?)">', new_html)
if(new_stff):
url = new_stff.group(1)
title = params['fname']
type = ""
ext = ""
a, b, size = url_info(url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge)
else:
print("cannot find link, please review")
pdb.set_trace()
site_info = "vidto.me"
download = vidto_download
download_playlist = playlist_not_supported('vidto')

View File

@ -3,7 +3,12 @@
__all__ = ['vimeo_download', 'vimeo_download_by_id', 'vimeo_download_by_channel', 'vimeo_download_by_channel_id']
from ..common import *
from ..util.log import *
from ..extractor import VideoExtractor
from json import loads
import urllib.error
import urllib.parse
access_token = 'f6785418277b72c7c87d3132c79eec24' #By Beining
#----------------------------------------------------------------------
@ -11,10 +16,10 @@ def vimeo_download_by_channel(url, output_dir='.', merge=False, info_only=False,
"""str->None"""
# https://vimeo.com/channels/464686
channel_id = match1(url, r'http://vimeo.com/channels/(\w+)')
vimeo_download_by_channel_id(channel_id, output_dir, merge, info_only)
vimeo_download_by_channel_id(channel_id, output_dir, merge, info_only, **kwargs)
#----------------------------------------------------------------------
def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_only=False):
def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_only=False, **kwargs):
"""str/int->None"""
html = get_content('https://api.vimeo.com/channels/{channel_id}/videos?access_token={access_token}'.format(channel_id=channel_id, access_token=access_token))
data = loads(html)
@ -25,15 +30,116 @@ def vimeo_download_by_channel_id(channel_id, output_dir='.', merge=False, info_o
id_list.append(match1(i['uri'], r'/videos/(\w+)'))
for id in id_list:
vimeo_download_by_id(id, None, output_dir, merge, info_only)
try:
vimeo_download_by_id(id, None, output_dir, merge, info_only, **kwargs)
except urllib.error.URLError as e:
log.w('{} failed with {}'.format(id, e))
class VimeoExtractor(VideoExtractor):
stream_types = [
{'id': '2160p', 'video_profile': '3840x2160'},
{'id': '1440p', 'video_profile': '2560x1440'},
{'id': '1080p', 'video_profile': '1920x1080'},
{'id': '720p', 'video_profile': '1280x720'},
{'id': '540p', 'video_profile': '960x540'},
{'id': '360p', 'video_profile': '640x360'}
]
name = 'Vimeo'
def prepare(self, **kwargs):
headers = fake_headers.copy()
if 'referer' in kwargs:
headers['Referer'] = kwargs['referer']
try:
page = get_content('https://vimeo.com/{}'.format(self.vid))
cfg_patt = r'clip_page_config\s*=\s*(\{.+?\});'
cfg = json.loads(match1(page, cfg_patt))
video_page = get_content(cfg['player']['config_url'], headers=headers)
self.title = cfg['clip']['title']
info = json.loads(video_page)
except Exception as e:
page = get_content('https://player.vimeo.com/video/{}'.format(self.vid))
self.title = r1(r'<title>([^<]+)</title>', page)
info = json.loads(match1(page, r'var t=(\{.+?\});'))
plain = info['request']['files']['progressive']
for s in plain:
meta = dict(src=[s['url']], container='mp4')
meta['video_profile'] = '{}x{}'.format(s['width'], s['height'])
for stream in self.__class__.stream_types:
if s['quality'] == stream['id']:
self.streams[s['quality']] = meta
self.master_m3u8 = info['request']['files']['hls']['cdns']
def extract(self, **kwargs):
for s in self.streams:
self.streams[s]['size'] = urls_size(self.streams[s]['src'])
master_m3u8s = []
for m in self.master_m3u8:
master_m3u8s.append(self.master_m3u8[m]['url'])
master_content = None
master_url = None
for master_u in master_m3u8s:
try:
master_content = get_content(master_u).split('\n')
except urllib.error.URLError:
continue
else:
master_url = master_u
if master_content is None:
return
lines = []
for line in master_content:
if len(line.strip()) > 0:
lines.append(line.strip())
pos = 0
while pos < len(lines):
if lines[pos].startswith('#EXT-X-STREAM-INF'):
patt = 'RESOLUTION=(\d+)x(\d+)'
hit = re.search(patt, lines[pos])
if hit is None:
continue
width = hit.group(1)
height = hit.group(2)
if height in ('2160', '1440'):
m3u8_url = urllib.parse.urljoin(master_url, lines[pos+1])
meta = dict(m3u8_url=m3u8_url, container='m3u8')
if height == '1440':
meta['video_profile'] = '2560x1440'
else:
meta['video_profile'] = '3840x2160'
meta['size'] = 0
meta['src'] = general_m3u8_extractor(m3u8_url)
self.streams[height+'p'] = meta
pos += 2
else:
pos += 1
self.streams_sorted = []
for stream_type in self.stream_types:
if stream_type['id'] in self.streams:
item = [('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())
self.streams_sorted.append(dict(item))
def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=False, **kwargs):
'''
try:
# normal Vimeo video
html = get_content('https://vimeo.com/' + id)
config_url = unescape_html(r1(r'data-config-url="([^"]+)"', html))
video_page = get_content(config_url, headers=fake_headers)
title = r1(r'"title":"([^"]+)"', video_page)
cfg_patt = r'clip_page_config\s*=\s*(\{.+?\});'
cfg = json.loads(match1(html, cfg_patt))
video_page = get_content(cfg['player']['config_url'], headers=fake_headers)
title = cfg['clip']['title']
info = loads(video_page)
except:
# embedded player - referer may be required
@ -42,7 +148,7 @@ def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=F
video_page = get_content('http://player.vimeo.com/video/%s' % id, headers=fake_headers)
title = r1(r'<title>([^<]+)</title>', video_page)
info = loads(match1(video_page, r'var t=(\{[^;]+\});'))
info = loads(match1(video_page, r'var t=(\{.+?\});'))
streams = info['request']['files']['progressive']
streams = sorted(streams, key=lambda i: i['height'])
@ -53,6 +159,9 @@ def vimeo_download_by_id(id, title=None, output_dir='.', merge=True, info_only=F
print_info(site_info, title, type, size)
if not info_only:
download_urls([url], title, ext, size, output_dir, merge=merge, faker=True)
'''
site = VimeoExtractor()
site.download_by_vid(id, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
def vimeo_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if re.match(r'https?://vimeo.com/channels/\w+', url):

View File

@ -3,15 +3,26 @@
__all__ = ['vine_download']
from ..common import *
import json
def vine_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
html = get_html(url)
html = get_content(url)
vid = r1(r'vine.co/v/([^/]+)', url)
video_id = r1(r'vine.co/v/([^/]+)', url)
title = r1(r'<title>([^<]*)</title>', html)
stream = r1(r'<meta property="twitter:player:stream" content="([^"]*)">', html)
if not stream: # https://vine.co/v/.../card
stream = r1(r'"videoUrl":"([^"]+)"', html).replace('\\/', '/')
stream = r1(r'"videoUrl":"([^"]+)"', html)
if stream:
stream = stream.replace('\\/', '/')
else:
posts_url = 'https://archive.vine.co/posts/' + video_id + '.json'
json_data = json.loads(get_content(posts_url))
stream = json_data['videoDashUrl']
title = json_data['description']
if title == "":
title = json_data['username'].replace(" ", "_") + "_" + video_id
mime, ext, size = url_info(stream)
@ -19,6 +30,7 @@ def vine_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
if not info_only:
download_urls([stream], title, ext, size, output_dir, merge=merge)
site_info = "Vine.co"
download = vine_download
download_playlist = playlist_not_supported('vine')

View File

@ -22,6 +22,19 @@ def get_video_info(url):
return url, title, ext, size
def get_video_from_user_videolist(url):
ep = 'https://vk.com/al_video.php'
to_post = dict(act='show', al=1, module='direct', video=re.search(r'video(\d+_\d+)', url).group(1))
page = post_content(ep, post_data=to_post)
video_pt = r'<source src="(.+?)" type="video\/mp4"'
url = re.search(video_pt, page).group(1)
title = re.search(r'<div class="mv_title".+?>(.+?)</div>', page).group(1)
mime, ext, size = url_info(url)
print_info(site_info, title, mime, size)
return url, title, ext, size
def get_image_info(url):
image_page = get_content(url)
# used for title - vk page owner
@ -43,6 +56,8 @@ def vk_download(url, output_dir='.', stream_type=None, merge=True, info_only=Fal
link, title, ext, size = get_video_info(url)
elif re.match(r'(.+)vk\.com\/photo(.+)', url):
link, title, ext, size = get_image_info(url)
elif re.search(r'vk\.com\/video\d+_\d+', url):
link, title, ext, size = get_video_from_user_videolist(url)
else:
raise NotImplementedError('Nothing to download here')

View File

@ -28,51 +28,52 @@ def location_dec(str):
return parse.unquote(out).replace("^", "0")
def xiami_download_lyric(lrc_url, file_name, output_dir):
lrc = get_html(lrc_url, faker = True)
lrc = get_content(lrc_url, headers=fake_headers)
filename = get_filename(file_name)
if len(lrc) > 0:
with open(output_dir + "/" + filename + '.lrc', 'w', encoding='utf-8') as x:
x.write(lrc)
def xiami_download_pic(pic_url, file_name, output_dir):
from ..util.strings import get_filename
pic_url = pic_url.replace('_1', '')
pos = pic_url.rfind('.')
ext = pic_url[pos:]
pic = get_response(pic_url, faker = True).data
pic = get_content(pic_url, headers=fake_headers, decoded=False)
if len(pic) > 0:
with open(output_dir + "/" + file_name.replace('/', '-') + ext, 'wb') as x:
x.write(pic)
def xiami_download_song(sid, output_dir = '.', merge = True, info_only = False):
xml = get_html('http://www.xiami.com/song/playlist/id/%s/object_name/default/object_id/0' % sid, faker = True)
def xiami_download_song(sid, output_dir = '.', info_only = False):
xml = get_content('http://www.xiami.com/song/playlist/id/%s/object_name/default/object_id/0' % sid, headers=fake_headers)
doc = parseString(xml)
i = doc.getElementsByTagName("track")[0]
artist = i.getElementsByTagName("artist")[0].firstChild.nodeValue
album_name = i.getElementsByTagName("album_name")[0].firstChild.nodeValue
song_title = i.getElementsByTagName("title")[0].firstChild.nodeValue
song_title = i.getElementsByTagName("name")[0].firstChild.nodeValue
url = location_dec(i.getElementsByTagName("location")[0].firstChild.nodeValue)
try:
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
except:
pass
type, ext, size = url_info(url, faker = True)
type_, ext, size = url_info(url, headers=fake_headers)
if not ext:
ext = 'mp3'
print_info(site_info, song_title, ext, size)
if not info_only:
file_name = "%s - %s - %s" % (song_title, artist, album_name)
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
try:
xiami_download_lyric(lrc_url, file_name, output_dir)
except:
pass
def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only = False):
html = get_html('http://www.xiami.com/song/showcollect/id/' + cid, faker = True)
def xiami_download_showcollect(cid, output_dir = '.', info_only = False):
html = get_content('http://www.xiami.com/song/showcollect/id/' + cid, headers=fake_headers)
collect_name = r1(r'<title>(.*)</title>', html)
xml = get_html('http://www.xiami.com/song/playlist/id/%s/type/3' % cid, faker = True)
xml = get_content('http://www.xiami.com/song/playlist/id/%s/type/3' % cid, headers=fake_headers)
doc = parseString(xml)
output_dir = output_dir + "/" + "[" + collect_name + "]"
tracks = doc.getElementsByTagName("track")
@ -92,14 +93,14 @@ def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only =
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
except:
pass
type, ext, size = url_info(url, faker = True)
type_, ext, size = url_info(url, headers=fake_headers)
if not ext:
ext = 'mp3'
print_info(site_info, song_title, type, size)
print_info(site_info, song_title, ext, size)
if not info_only:
file_name = "%02d.%s - %s - %s" % (track_nr, song_title, artist, album_name)
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
try:
xiami_download_lyric(lrc_url, file_name, output_dir)
except:
@ -107,17 +108,22 @@ def xiami_download_showcollect(cid, output_dir = '.', merge = True, info_only =
track_nr += 1
def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False):
xml = get_html('http://www.xiami.com/song/playlist/id/%s/type/1' % aid, faker = True)
def xiami_download_album(aid, output_dir='.', info_only=False):
xml = get_content('http://www.xiami.com/song/playlist/id/%s/type/1' % aid, headers=fake_headers)
album_name = r1(r'<album_name><!\[CDATA\[(.*)\]\]>', xml)
artist = r1(r'<artist><!\[CDATA\[(.*)\]\]>', xml)
doc = parseString(xml)
output_dir = output_dir + "/%s - %s" % (artist, album_name)
tracks = doc.getElementsByTagName("track")
track_list = doc.getElementsByTagName('trackList')[0]
tracks = track_list.getElementsByTagName("track")
track_nr = 1
pic_exist = False
for i in tracks:
song_title = i.getElementsByTagName("title")[0].firstChild.nodeValue
#in this xml track tag is used for both "track in a trackList" and track no
#dirty here
if i.firstChild.nodeValue is not None:
continue
song_title = i.getElementsByTagName("songName")[0].firstChild.nodeValue
url = location_dec(i.getElementsByTagName("location")[0].firstChild.nodeValue)
try:
lrc_url = i.getElementsByTagName("lyric")[0].firstChild.nodeValue
@ -125,14 +131,14 @@ def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False)
pass
if not pic_exist:
pic_url = i.getElementsByTagName("pic")[0].firstChild.nodeValue
type, ext, size = url_info(url, faker = True)
type_, ext, size = url_info(url, headers=fake_headers)
if not ext:
ext = 'mp3'
print_info(site_info, song_title, type, size)
print_info(site_info, song_title, ext, size)
if not info_only:
file_name = "%02d.%s" % (track_nr, song_title)
download_urls([url], file_name, ext, size, output_dir, merge = merge, faker = True)
download_urls([url], file_name, ext, size, output_dir, headers=fake_headers)
try:
xiami_download_lyric(lrc_url, file_name, output_dir)
except:
@ -143,22 +149,66 @@ def xiami_download_album(aid, output_dir = '.', merge = True, info_only = False)
track_nr += 1
def xiami_download(url, output_dir = '.', stream_type = None, merge = True, info_only = False, **kwargs):
def xiami_download_mv(url, output_dir='.', merge=True, info_only=False):
# FIXME: broken merge
page = get_content(url, headers=fake_headers)
title = re.findall('<title>([^<]+)', page)[0]
vid, uid = re.findall(r'vid:"(\d+)",uid:"(\d+)"', page)[0]
api_url = 'http://cloud.video.taobao.com/videoapi/info.php?vid=%s&uid=%s' % (vid, uid)
result = get_content(api_url, headers=fake_headers)
doc = parseString(result)
video_url = doc.getElementsByTagName("video_url")[-1].firstChild.nodeValue
length = int(doc.getElementsByTagName("length")[-1].firstChild.nodeValue)
v_urls = []
k_start = 0
total_size = 0
while True:
k_end = k_start + 20000000
if k_end >= length: k_end = length - 1
v_url = video_url + '/start_%s/end_%s/1.flv' % (k_start, k_end)
try:
_, ext, size = url_info(v_url)
except:
break
v_urls.append(v_url)
total_size += size
k_start = k_end + 1
print_info(site_info, title, ext, total_size)
if not info_only:
download_urls(v_urls, title, ext, total_size, output_dir, merge=merge, headers=fake_headers)
def xiami_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
#albums
if re.match(r'http://www.xiami.com/album/\d+', url):
id = r1(r'http://www.xiami.com/album/(\d+)', url)
xiami_download_album(id, output_dir, merge, info_only)
xiami_download_album(id, output_dir, info_only)
elif re.match(r'http://www.xiami.com/album/\w+', url):
page = get_content(url, headers=fake_headers)
album_id = re.search(r'rel="canonical"\s+href="http://www.xiami.com/album/([^"]+)"', page).group(1)
xiami_download_album(album_id, output_dir, info_only)
#collections
if re.match(r'http://www.xiami.com/collect/\d+', url):
id = r1(r'http://www.xiami.com/collect/(\d+)', url)
xiami_download_showcollect(id, output_dir, merge, info_only)
xiami_download_showcollect(id, output_dir, info_only)
if re.match('http://www.xiami.com/song/\d+', url):
#single track
if re.match(r'http://www.xiami.com/song/\d+\b', url):
id = r1(r'http://www.xiami.com/song/(\d+)', url)
xiami_download_song(id, output_dir, merge, info_only)
xiami_download_song(id, output_dir, info_only)
elif re.match(r'http://www.xiami.com/song/\w+', url):
html = get_content(url, headers=fake_headers)
id = r1(r'rel="canonical" href="http://www.xiami.com/song/([^"]+)"', html)
xiami_download_song(id, output_dir, info_only)
if re.match('http://www.xiami.com/song/detail/id/\d+', url):
id = r1(r'http://www.xiami.com/song/detail/id/(\d+)', url)
xiami_download_song(id, output_dir, merge, info_only)
xiami_download_song(id, output_dir, info_only)
if re.match('http://www.xiami.com/mv', url):
xiami_download_mv(url, output_dir, merge=merge, info_only=info_only)
site_info = "Xiami.com"
download = xiami_download

View File

@ -0,0 +1,98 @@
#!/usr/bin/env python
__all__ = ['ximalaya_download_playlist', 'ximalaya_download', 'ximalaya_download_by_id']
from ..common import *
import json
import re
stream_types = [
{'itag': '1', 'container': 'm4a', 'bitrate': 'default'},
{'itag': '2', 'container': 'm4a', 'bitrate': '32'},
{'itag': '3', 'container': 'm4a', 'bitrate': '64'}
]
def ximalaya_download_by_id(id, title = None, output_dir = '.', info_only = False, stream_id = None):
BASE_URL = 'http://www.ximalaya.com/tracks/'
json_url = BASE_URL + id + '.json'
json_data = json.loads(get_content(json_url, headers=fake_headers))
if 'res' in json_data:
if json_data['res'] == False:
raise ValueError('Server reported id %s is invalid' % id)
if 'is_paid' in json_data and json_data['is_paid']:
if 'is_free' in json_data and not json_data['is_free']:
raise ValueError('%s is paid item' % id)
if (not title) and 'title' in json_data:
title = json_data['title']
#no size data in the json. should it be calculated?
size = 0
url = json_data['play_path_64']
if stream_id:
if stream_id == '1':
url = json_data['play_path_32']
elif stream_id == '0':
url = json_data['play_path']
logging.debug('ximalaya_download_by_id: %s' % url)
ext = 'm4a'
urls = [url]
print('Site: %s' % site_info)
print('title: %s' % title)
if info_only:
if stream_id:
print_stream_info(stream_id)
else:
for item in range(0, len(stream_types)):
print_stream_info(item)
if not info_only:
print('Type: MPEG-4 audio m4a')
print('Size: N/A')
download_urls(urls, title, ext, size, output_dir = output_dir, merge = False)
def ximalaya_download(url, output_dir = '.', info_only = False, stream_id = None, **kwargs):
if re.match(r'http://www\.ximalaya\.com/(\d+)/sound/(\d+)', url):
id = match1(url, r'http://www\.ximalaya\.com/\d+/sound/(\d+)')
else:
raise NotImplementedError(url)
ximalaya_download_by_id(id, output_dir = output_dir, info_only = info_only, stream_id = stream_id)
def ximalaya_download_page(playlist_url, output_dir = '.', info_only = False, stream_id = None, **kwargs):
if re.match(r'http://www\.ximalaya\.com/(\d+)/album/(\d+)', playlist_url):
page_content = get_content(playlist_url)
pattern = re.compile(r'<li sound_id="(\d+)"')
ids = pattern.findall(page_content)
for id in ids:
try:
ximalaya_download_by_id(id, output_dir=output_dir, info_only=info_only, stream_id=stream_id)
except(ValueError):
print("something wrong with %s, perhaps paid item?" % id)
else:
raise NotImplementedError(playlist_url)
def ximalaya_download_playlist(url, output_dir='.', info_only=False, stream_id=None, **kwargs):
match_result = re.match(r'http://www\.ximalaya\.com/(\d+)/album/(\d+)', url)
if not match_result:
raise NotImplementedError(url)
pages = []
page_content = get_content(url)
if page_content.find('<div class="pagingBar_wrapper"') == -1:
pages.append(url)
else:
base_url = 'http://www.ximalaya.com/' + match_result.group(1) + '/album/' + match_result.group(2)
html_str = '<a href=(\'|")\/' + match_result.group(1) + '\/album\/' + match_result.group(2) + '\?page='
count = len(re.findall(html_str, page_content))
for page_num in range(count):
pages.append(base_url + '?page=' +str(page_num+1))
print(pages[-1])
for page in pages:
ximalaya_download_page(page, output_dir=output_dir, info_only=info_only, stream_id=stream_id)
def print_stream_info(stream_id):
print(' - itag: %s' % stream_id)
print(' container: %s' % 'm4a')
print(' bitrate: %s' % stream_types[int(stream_id)]['bitrate'])
print(' size: %s' % 'N/A')
print(' # download-with: you-get --itag=%s [URL]' % stream_id)
site_info = 'ximalaya.com'
download = ximalaya_download
download_playlist = ximalaya_download_playlist

View File

@ -0,0 +1,44 @@
#!/usr/bin/env python
import re
import json
from ..extractor import VideoExtractor
from ..common import get_content, playlist_not_supported
class Xinpianchang(VideoExtractor):
name = 'xinpianchang'
stream_types = [
{'id': '4K', 'quality': '超清 4K', 'video_profile': 'mp4-4K'},
{'id': '2K', 'quality': '超清 2K', 'video_profile': 'mp4-2K'},
{'id': '1080', 'quality': '高清 1080P', 'video_profile': 'mp4-FHD'},
{'id': '720', 'quality': '高清 720P', 'video_profile': 'mp4-HD'},
{'id': '540', 'quality': '清晰 540P', 'video_profile': 'mp4-SD'},
{'id': '360', 'quality': '流畅 360P', 'video_profile': 'mp4-LD'}
]
def prepare(self, **kwargs):
# find key
page_content = get_content(self.url)
match_rule = r"vid = \"(.+?)\";"
key = re.findall(match_rule, page_content)[0]
# get videos info
video_url = 'https://openapi-vtom.vmovier.com/v3/video/' + key + '?expand=resource'
data = json.loads(get_content(video_url))
self.title = data["data"]["video"]["title"]
video_info = data["data"]["resource"]["progressive"]
# set streams dict
for video in video_info:
url = video["https_url"]
size = video["filesize"]
profile = video["profile_code"]
stype = [st for st in self.__class__.stream_types if st['video_profile'] == profile][0]
stream_data = dict(src=[url], size=size, container='mp4', quality=stype['quality'])
self.streams[stype['id']] = stream_data
download = Xinpianchang().download_by_url
download_playlist = playlist_not_supported('xinpianchang')

View File

@ -7,6 +7,24 @@ from urllib.parse import urlparse
from json import loads
import re
#----------------------------------------------------------------------
def miaopai_download_by_smid(smid, output_dir = '.', merge = True, info_only = False):
""""""
api_endpoint = 'https://n.miaopai.com/api/aj_media/info.json?smid={smid}'.format(smid = smid)
html = get_content(api_endpoint)
api_content = loads(html)
video_url = api_content['data']['meta_data'][0]['play_urls']['l']
title = api_content['data']['description']
type, ext, size = url_info(video_url)
print_info(site_info, title, type, size)
if not info_only:
download_urls([video_url], title, ext, size, output_dir, merge=merge)
#----------------------------------------------------------------------
def yixia_miaopai_download_by_scid(scid, output_dir = '.', merge = True, info_only = False):
""""""
@ -47,16 +65,18 @@ def yixia_xiaokaxiu_download_by_scid(scid, output_dir = '.', merge = True, info_
def yixia_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
"""wrapper"""
hostname = urlparse(url).hostname
if 'miaopai.com' in hostname: #Miaopai
if 'n.miaopai.com' == hostname:
smid = match1(url, r'n\.miaopai\.com/media/([^.]+)')
miaopai_download_by_smid(smid, output_dir, merge, info_only)
return
elif 'miaopai.com' in hostname: #Miaopai
yixia_download_by_scid = yixia_miaopai_download_by_scid
site_info = "Yixia Miaopai"
if re.match(r'http://www.miaopai.com/show/channel/\w+', url): #PC
scid = match1(url, r'http://www.miaopai.com/show/channel/(.+)\.htm')
elif re.match(r'http://www.miaopai.com/show/\w+', url): #PC
scid = match1(url, r'http://www.miaopai.com/show/(.+)\.htm')
elif re.match(r'http://m.miaopai.com/show/channel/\w+', url): #Mobile
scid = match1(url, r'http://m.miaopai.com/show/channel/(.+)\.htm')
scid = match1(url, r'miaopai\.com/show/channel/([^.]+)\.htm') or \
match1(url, r'miaopai\.com/show/([^.]+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)\.htm') or \
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)')
elif 'xiaokaxiu.com' in hostname: #Xiaokaxiu
yixia_download_by_scid = yixia_xiaokaxiu_download_by_scid

View File

@ -0,0 +1,37 @@
#!/usr/bin/env python
__all__ = ['yizhibo_download']
from ..common import *
import json
import time
def yizhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
video_id = url[url.rfind('/')+1:].split(".")[0]
json_request_url = 'http://www.yizhibo.com/live/h5api/get_basic_live_info?scid={}'.format(video_id)
content = get_content(json_request_url)
error = json.loads(content)['result']
if (error != 1):
raise ValueError("Error : {}".format(error))
data = json.loads(content)
title = data.get('data')['live_title']
if (title == ''):
title = data.get('data')['nickname']
m3u8_url = data.get('data')['play_url']
m3u8 = get_content(m3u8_url)
base_url = "/".join(data.get('data')['play_url'].split("/")[:7])+"/"
part_url = re.findall(r'([0-9]+\.ts)', m3u8)
real_url = []
for i in part_url:
url = base_url + i
real_url.append(url)
print_info(site_info, title, 'ts', float('inf'))
if not info_only:
if player:
launch_player(player, [m3u8_url])
download_urls(real_url, title, 'ts', float('inf'), output_dir, merge = merge)
site_info = "yizhibo.com"
download = yizhibo_download
download_playlist = playlist_not_supported('yizhibo')

View File

@ -4,217 +4,197 @@
from ..common import *
from ..extractor import VideoExtractor
import base64
import ssl
import time
import traceback
import json
import urllib.request
import urllib.parse
def fetch_cna():
def quote_cna(val):
if '%' in val:
return val
return urllib.parse.quote(val)
if cookies:
for cookie in cookies:
if cookie.name == 'cna' and cookie.domain == '.youku.com':
log.i('Found cna in imported cookies. Use it')
return quote_cna(cookie.value)
url = 'http://log.mmstat.com/eg.js'
req = urllib.request.urlopen(url)
headers = req.getheaders()
for header in headers:
if header[0].lower() == 'set-cookie':
n_v = header[1].split(';')[0]
name, value = n_v.split('=')
if name == 'cna':
return quote_cna(value)
log.w('It seems that the client failed to fetch a cna cookie. Please load your own cookie if possible')
return quote_cna('DOG4EdW4qzsCAbZyXbU+t7Jt')
class Youku(VideoExtractor):
name = "优酷 (Youku)"
mobile_ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36'
dispatcher_url = 'vali.cp31.ott.cibntv.net'
# Last updated: 2015-11-24
stream_types = [
{'id': 'mp4hd3', 'alias-of' : 'hd3'},
{'id': 'hd3', 'container': 'flv', 'video_profile': '1080P'},
{'id': 'mp4hd2', 'alias-of' : 'hd2'},
{'id': 'hd3v2', 'container': 'flv', 'video_profile': '1080P'},
{'id': 'mp4hd3', 'container': 'mp4', 'video_profile': '1080P'},
{'id': 'mp4hd3v2', 'container': 'mp4', 'video_profile': '1080P'},
{'id': 'hd2', 'container': 'flv', 'video_profile': '超清'},
{'id': 'mp4hd', 'alias-of' : 'mp4'},
{'id': 'mp4', 'container': 'mp4', 'video_profile': '高清'},
{'id': 'flvhd', 'container': 'flv', 'video_profile': '标清'},
{'id': 'hd2v2', 'container': 'flv', 'video_profile': '超清'},
{'id': 'mp4hd2', 'container': 'mp4', 'video_profile': '超清'},
{'id': 'mp4hd2v2', 'container': 'mp4', 'video_profile': '超清'},
{'id': 'mp4hd', 'container': 'mp4', 'video_profile': '高清'},
# not really equivalent to mp4hd
{'id': 'flvhd', 'container': 'flv', 'video_profile': '渣清'},
{'id': '3gphd', 'container': 'mp4', 'video_profile': '渣清'},
{'id': 'mp4sd', 'container': 'mp4', 'video_profile': '标清'},
# obsolete?
{'id': 'flv', 'container': 'flv', 'video_profile': '标清'},
{'id': '3gphd', 'container': '3gp', 'video_profile': '标清3GP'},
{'id': 'mp4', 'container': 'mp4', 'video_profile': '标清'},
]
f_code_1 = 'becaf9be'
f_code_2 = 'bf7e5f01'
def __init__(self):
super().__init__()
ctype = 12 #differ from 86
self.ua = self.__class__.mobile_ua
self.referer = 'http://v.youku.com'
def trans_e(a, c):
"""str, str->str
This is an RC4 encryption."""
f = h = 0
b = list(range(256))
result = ''
while h < 256:
f = (f + b[h] + ord(a[h % len(a)])) % 256
b[h], b[f] = b[f], b[h]
h += 1
q = f = h = 0
while q < len(c):
h = (h + 1) % 256
f = (f + b[h]) % 256
b[h], b[f] = b[f], b[h]
if isinstance(c[q], int):
result += chr(c[q] ^ b[(b[h] + b[f]) % 256])
self.page = None
self.video_list = None
self.video_next = None
self.password = None
self.api_data = None
self.api_error_code = None
self.api_error_msg = None
self.ccode = '0590'
# Found in http://g.alicdn.com/player/ykplayer/0.5.64/youku-player.min.js
# grep -oE '"[0-9a-zA-Z+/=]{256}"' youku-player.min.js
self.ckey = 'DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd+WOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3+nb0v72gZ6b0td+WOZsHHWxysSo/0y9D2K42SaB8Y/+aD2K42SaB8Y/+ahU+WOZsHcrxysooUeND'
self.utid = None
def youku_ups(self):
url = 'https://ups.youku.com/ups/get.json?vid={}&ccode={}'.format(self.vid, self.ccode)
url += '&client_ip=192.168.1.1'
url += '&utid=' + self.utid
url += '&client_ts=' + str(int(time.time()))
url += '&ckey=' + urllib.parse.quote(self.ckey)
if self.password_protected:
url += '&password=' + self.password
headers = dict(Referer=self.referer)
headers['User-Agent'] = self.ua
api_meta = json.loads(get_content(url, headers=headers))
self.api_data = api_meta['data']
data_error = self.api_data.get('error')
if data_error:
self.api_error_code = data_error.get('code')
self.api_error_msg = data_error.get('note')
if 'videos' in self.api_data:
if 'list' in self.api_data['videos']:
self.video_list = self.api_data['videos']['list']
if 'next' in self.api_data['videos']:
self.video_next = self.api_data['videos']['next']
@classmethod
def change_cdn(cls, url):
# if the cnd_url starts with an ip addr, it should be youku's old CDN
# which rejects http requests randomly with status code > 400
# change it to the dispatcher of aliCDN can do better
# at least a little more recoverable from HTTP 403
if cls.dispatcher_url in url:
return url
elif 'k.youku.com' in url:
return url
else:
result += chr(ord(c[q]) ^ b[(b[h] + b[f]) % 256])
q += 1
url_seg_list = list(urllib.parse.urlsplit(url))
url_seg_list[1] = cls.dispatcher_url
return urllib.parse.urlunsplit(url_seg_list)
return result
def get_vid_from_url(self):
# It's unreliable. check #1633
b64p = r'([a-zA-Z0-9=]+)'
p_list = [r'youku\.com/v_show/id_'+b64p,
r'player\.youku\.com/player\.php/sid/'+b64p+r'/v\.swf',
r'loader\.swf\?VideoIDS='+b64p,
r'player\.youku\.com/embed/'+b64p]
if not self.url:
raise Exception('No url')
for p in p_list:
hit = re.search(p, self.url)
if hit is not None:
self.vid = hit.group(1)
return
def generate_ep(self, no, streamfileids, sid, token):
number = hex(int(str(no), 10))[2:].upper()
if len(number) == 1:
number = '0' + number
fileid = streamfileids[0:8] + number + streamfileids[10:]
ep = parse.quote(base64.b64encode(
''.join(self.__class__.trans_e(
self.f_code_2, #use the 86 fcode if using 86
sid + '_' + fileid + '_' + token)).encode('latin1')),
safe='~()*!.\''
)
return fileid, ep
# Obsolete -- used to parse m3u8 on pl.youku.com
def parse_m3u8(m3u8):
return re.findall(r'(http://[^?]+)\?ts_start=0', m3u8)
def oset(xs):
"""Turns a list into an ordered set. (removes duplicates)"""
mem = set()
for x in xs:
if x not in mem:
mem.add(x)
return mem
def get_vid_from_url(url):
"""Extracts video ID from URL.
"""
return match1(url, r'youku\.com/v_show/id_([a-zA-Z0-9=]+)') or \
match1(url, r'player\.youku\.com/player\.php/sid/([a-zA-Z0-9=]+)/v\.swf') or \
match1(url, r'loader\.swf\?VideoIDS=([a-zA-Z0-9=]+)') or \
match1(url, r'player\.youku\.com/embed/([a-zA-Z0-9=]+)')
def get_playlist_id_from_url(url):
"""Extracts playlist ID from URL.
"""
return match1(url, r'youku\.com/albumlist/show\?id=([a-zA-Z0-9=]+)')
def download_playlist_by_url(self, url, **kwargs):
self.url = url
try:
playlist_id = self.__class__.get_playlist_id_from_url(self.url)
assert playlist_id
video_page = get_content('http://list.youku.com/albumlist/show?id=%s' % playlist_id)
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
# Parse multi-page playlists
last_page_url = re.findall(r'href="(/albumlist/show\?id=%s[^"]+)" title="末页"' % playlist_id, video_page)[0]
num_pages = int(re.findall(r'page=([0-9]+)\.htm', last_page_url)[0])
if (num_pages > 0):
# download one by one
for pn in range(2, num_pages + 1):
extra_page_url = re.sub(r'page=([0-9]+)\.htm', r'page=%s.htm' % pn, last_page_url)
extra_page = get_content('http://list.youku.com' + extra_page_url)
videos |= Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', extra_page))
except:
# Show full list of episodes
if match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)'):
ep_id = match1(url, r'youku\.com/show_page/id_([a-zA-Z0-9=]+)')
url = 'http://www.youku.com/show_episode/id_%s' % ep_id
video_page = get_content(url)
videos = Youku.oset(re.findall(r'href="(http://v\.youku\.com/[^?"]+)', video_page))
self.title = r1(r'<meta name="title" content="([^"]+)"', video_page) or \
r1(r'<title>([^<]+)', video_page)
self.p_playlist()
for video in videos:
index = parse_query_param(video, 'f')
try:
self.__class__().download_by_url(video, index=index, **kwargs)
except KeyboardInterrupt:
raise
except:
exc_type, exc_value, exc_traceback = sys.exc_info()
traceback.print_exception(exc_type, exc_value, exc_traceback)
def get_vid_from_page(self):
if not self.url:
raise Exception('No url')
self.page = get_content(self.url)
hit = re.search(r'videoId2:"([A-Za-z0-9=]+)"', self.page)
if hit is not None:
self.vid = hit.group(1)
def prepare(self, **kwargs):
# Hot-plug cookie handler
ssl_context = request.HTTPSHandler(
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
cookie_handler = request.HTTPCookieProcessor()
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
proxy = parse_host(kwargs['extractor_proxy'])
proxy_handler = request.ProxyHandler({
'http': '%s:%s' % proxy,
'https': '%s:%s' % proxy,
})
else:
proxy_handler = request.ProxyHandler({})
opener = request.build_opener(ssl_context, cookie_handler, proxy_handler)
opener.addheaders = [('Cookie','__ysuid={}'.format(time.time()))]
request.install_opener(opener)
assert self.url or self.vid
if self.url and not self.vid:
self.vid = self.__class__.get_vid_from_url(self.url)
self.get_vid_from_url()
if self.vid is None:
self.download_playlist_by_url(self.url, **kwargs)
exit(0)
self.get_vid_from_page()
#HACK!
if 'api_url' in kwargs:
api_url = kwargs['api_url'] #85
api12_url = kwargs['api12_url'] #86
self.ctype = kwargs['ctype']
self.title = kwargs['title']
if self.vid is None:
log.wtf('Cannot fetch vid')
else:
api_url = 'http://play.youku.com/play/get.json?vid=%s&ct=10' % self.vid
api12_url = 'http://play.youku.com/play/get.json?vid=%s&ct=12' % self.vid
if kwargs.get('src') and kwargs['src'] == 'tudou':
self.ccode = '0512'
try:
meta = json.loads(get_content(
api_url,
headers={'Referer': 'http://static.youku.com/'}
))
meta12 = json.loads(get_content(
api12_url,
headers={'Referer': 'http://static.youku.com/'}
))
data = meta['data']
data12 = meta12['data']
assert 'stream' in data
except AssertionError:
if 'error' in data:
if data['error']['code'] == -202:
# Password protected
if kwargs.get('password') and kwargs['password']:
self.password_protected = True
self.password = kwargs['password']
self.utid = fetch_cna()
time.sleep(3)
self.youku_ups()
if self.api_data.get('stream') is None:
if self.api_error_code == -6001: # wrong vid parsed from the page
vid_from_url = self.vid
self.get_vid_from_page()
if vid_from_url == self.vid:
log.wtf(self.api_error_msg)
self.youku_ups()
if self.api_data.get('stream') is None:
if self.api_error_code == -2002: # wrong password
self.password_protected = True
# it can be True already(from cli). offer another chance to retry
self.password = input(log.sprint('Password: ', log.YELLOW))
api_url += '&pwd={}'.format(self.password)
api12_url += '&pwd={}'.format(self.password)
meta = json.loads(get_content(
api_url,
headers={'Referer': 'http://static.youku.com/'}
))
meta12 = json.loads(get_content(
api12_url,
headers={'Referer': 'http://static.youku.com/'}
))
data = meta['data']
data12 = meta12['data']
self.youku_ups()
if self.api_data.get('stream') is None:
if self.api_error_msg:
log.wtf(self.api_error_msg)
else:
log.wtf('[Failed] ' + data['error']['note'])
else:
log.wtf('[Failed] Video not found.')
if not self.title: #86
self.title = data['video']['title']
self.ep = data12['security']['encrypt_string']
self.ip = data12['security']['ip']
if 'stream' not in data and self.password_protected:
log.wtf('[Failed] Wrong password.')
log.wtf('Unknown error')
self.title = self.api_data['video']['title']
stream_types = dict([(i['id'], i) for i in self.stream_types])
audio_lang = data['stream'][0]['audio_lang']
audio_lang = self.api_data['stream'][0]['audio_lang']
for stream in data['stream']:
for stream in self.api_data['stream']:
stream_id = stream['stream_type']
is_preview = False
if stream_id in stream_types and stream['audio_lang'] == audio_lang:
if 'alias-of' in stream_types[stream_id]:
stream_id = stream_types[stream_id]['alias-of']
@ -225,175 +205,111 @@ class Youku(VideoExtractor):
'video_profile': stream_types[stream_id]['video_profile'],
'size': stream['size'],
'pieces': [{
'fileid': stream['stream_fileid'],
'segs': stream['segs']
}]
}],
'm3u8_url': stream['m3u8_url']
}
src = []
for seg in stream['segs']:
if seg.get('cdn_url'):
src.append(self.__class__.change_cdn(seg['cdn_url']))
else:
is_preview = True
self.streams[stream_id]['src'] = src
else:
self.streams[stream_id]['size'] += stream['size']
self.streams[stream_id]['pieces'].append({
'fileid': stream['stream_fileid'],
'segs': stream['segs']
})
self.streams_fallback = {}
for stream in data12['stream']:
stream_id = stream['stream_type']
if stream_id in stream_types and stream['audio_lang'] == audio_lang:
if 'alias-of' in stream_types[stream_id]:
stream_id = stream_types[stream_id]['alias-of']
if stream_id not in self.streams_fallback:
self.streams_fallback[stream_id] = {
'container': stream_types[stream_id]['container'],
'video_profile': stream_types[stream_id]['video_profile'],
'size': stream['size'],
'pieces': [{
'fileid': stream['stream_fileid'],
'segs': stream['segs']
}]
}
src = []
for seg in stream['segs']:
if seg.get('cdn_url'):
src.append(self.__class__.change_cdn(seg['cdn_url']))
else:
self.streams_fallback[stream_id]['size'] += stream['size']
self.streams_fallback[stream_id]['pieces'].append({
'fileid': stream['stream_fileid'],
'segs': stream['segs']
})
is_preview = True
self.streams[stream_id]['src'].extend(src)
if is_preview:
log.w('{} is a preview'.format(stream_id))
# Audio languages
if 'dvd' in data and 'audiolang' in data['dvd']:
self.audiolang = data['dvd']['audiolang']
if 'dvd' in self.api_data:
al = self.api_data['dvd'].get('audiolang')
if al:
self.audiolang = al
for i in self.audiolang:
i['url'] = 'http://v.youku.com/v_show/id_{}'.format(i['vid'])
def extract(self, **kwargs):
if 'stream_id' in kwargs and kwargs['stream_id']:
# Extract the stream
stream_id = kwargs['stream_id']
if stream_id not in self.streams:
log.e('[Error] Invalid video format.')
log.e('Run \'-i\' command with no specific video format to view all available formats.')
exit(2)
else:
# Extract stream with the best quality
stream_id = self.streams_sorted[0]['id']
e_code = self.__class__.trans_e(
self.f_code_1,
base64.b64decode(bytes(self.ep, 'ascii'))
)
sid, token = e_code.split('_')
while True:
def youku_download_playlist_by_url(url, **kwargs):
video_page_pt = 'https?://v.youku.com/v_show/id_([A-Za-z0-9=]+)'
js_cb_pt = '\(({.+})\)'
if re.match(video_page_pt, url):
youku_obj = Youku()
youku_obj.url = url
youku_obj.prepare(**kwargs)
total_episode = None
try:
ksegs = []
pieces = self.streams[stream_id]['pieces']
for piece in pieces:
segs = piece['segs']
streamfileid = piece['fileid']
for no in range(0, len(segs)):
k = segs[no]['key']
if k == -1: break # we hit the paywall; stop here
fileid, ep = self.__class__.generate_ep(self, no, streamfileid,
sid, token)
q = parse.urlencode(dict(
ctype = self.ctype,
ev = 1,
K = k,
ep = parse.unquote(ep),
oip = str(self.ip),
token = token,
yxon = 1
))
u = 'http://k.youku.com/player/getFlvPath/sid/{sid}_00' \
'/st/{container}/fileid/{fileid}?{q}'.format(
sid = sid,
container = self.streams[stream_id]['container'],
fileid = fileid,
q = q
)
ksegs += [i['server'] for i in json.loads(get_content(u))]
if (parse_host(ksegs[len(ksegs)-1])[0] == "vali.cp31.ott.cibntv.net"):
ksegs.pop(len(ksegs)-1)
except error.HTTPError as e:
# Use fallback stream data in case of HTTP 404
log.e('[Error] ' + str(e))
self.streams = {}
self.streams = self.streams_fallback
total_episode = youku_obj.api_data['show']['episode_total']
except KeyError:
# Move on to next stream if best quality not available
del self.streams_sorted[0]
stream_id = self.streams_sorted[0]['id']
else: break
if not kwargs['info_only']:
self.streams[stream_id]['src'] = ksegs
def open_download_by_vid(self, client_id, vid, **kwargs):
"""self, str, str, **kwargs->None
Arguments:
client_id: An ID per client. For now we only know Acfun's
such ID.
vid: An video ID for each video, starts with "C".
kwargs['embsig']: Youku COOP's anti hotlinking.
For Acfun, an API call must be done to Acfun's
server, or the "playsign" of the content of sign_url
shall be empty.
Misc:
Override the original one with VideoExtractor.
Author:
Most of the credit are to @ERioK, who gave his POC.
History:
Jul.28.2016 Youku COOP now have anti hotlinking via embsig. """
self.f_code_1 = '10ehfkbv' #can be retrived by running r.translate with the keys and the list e
self.f_code_2 = 'msjv7h2b'
# as in VideoExtractor
self.url = None
self.vid = vid
self.name = "优酷开放平台 (Youku COOP)"
#A little bit of work before self.prepare
#Change as Jul.28.2016 Youku COOP updates its platform to add ant hotlinking
if kwargs['embsig']:
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}&embsig={embsig}".format(client_id = client_id, video_id = vid, embsig = kwargs['embsig'])
log.wtf('Cannot get total_episode for {}'.format(url))
next_vid = youku_obj.vid
for _ in range(total_episode):
this_extractor = Youku()
this_extractor.download_by_vid(next_vid, keep_obj=True, **kwargs)
next_vid = this_extractor.video_next['encodevid']
'''
if youku_obj.video_list is None:
log.wtf('Cannot find video list for {}'.format(url))
else:
sign_url = "https://api.youku.com/players/custom.json?client_id={client_id}&video_id={video_id}".format(client_id = client_id, video_id = vid)
vid_list = [v['encodevid'] for v in youku_obj.video_list]
for v in vid_list:
Youku().download_by_vid(v, **kwargs)
'''
playsign = json.loads(get_content(sign_url))['playsign']
elif re.match('https?://list.youku.com/show/id_', url):
# http://list.youku.com/show/id_z2ae8ee1c837b11e18195.html
# official playlist
page = get_content(url)
show_id = re.search(r'showid:"(\d+)"', page).group(1)
ep = 'http://list.youku.com/show/module?id={}&tab=showInfo&callback=jQuery'.format(show_id)
xhr_page = get_content(ep).replace('\/', '/').replace('\"', '"')
video_url = re.search(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_page).group(1)
youku_download_playlist_by_url('http://'+video_url, **kwargs)
return
elif re.match('https?://list.youku.com/albumlist/show/id_(\d+)\.html', url):
# http://list.youku.com/albumlist/show/id_2336634.html
# UGC playlist
list_id = re.search('https?://list.youku.com/albumlist/show/id_(\d+)\.html', url).group(1)
ep = 'http://list.youku.com/albumlist/items?id={}&page={}&size=20&ascending=1&callback=tuijsonp6'
#to be injected and replace ct10 and 12
api85_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=85&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
api86_url = 'http://play.youku.com/partner/get.json?cid={client_id}&vid={vid}&ct=86&sign={playsign}'.format(client_id = client_id, vid = vid, playsign = playsign)
first_u = ep.format(list_id, 1)
xhr_page = get_content(first_u)
json_data = json.loads(re.search(js_cb_pt, xhr_page).group(1))
video_cnt = json_data['data']['total']
xhr_html = json_data['html']
v_urls = re.findall(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_html)
self.prepare(api_url = api85_url, api12_url = api86_url, ctype = 86, **kwargs)
if video_cnt > 20:
req_cnt = video_cnt // 20
for i in range(2, req_cnt+2):
req_u = ep.format(list_id, i)
xhr_page = get_content(req_u)
json_data = json.loads(re.search(js_cb_pt, xhr_page).group(1).replace('\/', '/'))
xhr_html = json_data['html']
page_videos = re.findall(r'(v.youku.com/v_show/id_(?:[A-Za-z0-9=]+)\.html)', xhr_html)
v_urls.extend(page_videos)
for u in v_urls[0::2]:
url = 'http://' + u
Youku().download_by_url(url, **kwargs)
return
#exact copy from original VideoExtractor
if 'extractor_proxy' in kwargs and kwargs['extractor_proxy']:
unset_proxy()
try:
self.streams_sorted = [dict([('id', stream_type['id'])] + list(self.streams[stream_type['id']].items())) for stream_type in self.__class__.stream_types if stream_type['id'] in self.streams]
except:
self.streams_sorted = [dict([('itag', stream_type['itag'])] + list(self.streams[stream_type['itag']].items())) for stream_type in self.__class__.stream_types if stream_type['itag'] in self.streams]
def youku_download_by_url(url, **kwargs):
Youku().download_by_url(url, **kwargs)
self.extract(**kwargs)
self.download(**kwargs)
def youku_download_by_vid(vid, **kwargs):
Youku().download_by_vid(vid, **kwargs)
site = Youku()
download = site.download_by_url
download_playlist = site.download_playlist_by_url
youku_download_by_vid = site.download_by_vid
youku_open_download_by_vid = site.open_download_by_vid
# Used by: acfun.py bilibili.py miomio.py tudou.py
download = youku_download_by_url
download_playlist = youku_download_playlist_by_url

View File

@ -8,35 +8,74 @@ from xml.dom.minidom import parseString
class YouTube(VideoExtractor):
name = "YouTube"
# YouTube media encoding options, in descending quality order.
# Non-DASH YouTube media encoding options, in descending quality order.
# http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs. Retrieved July 17, 2014.
stream_types = [
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
#{'itag': '85', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '3-4', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '',
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
#{'itag': '102', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2',
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
#{'itag': '84', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2', 'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p',
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3',
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p',
'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2',
'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1',
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p',
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1',
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
#{'itag': '101', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
#{'itag': '100', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p',
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5',
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p',
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
#{'itag': '82', 'container': 'MP4', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '18', 'container': 'MP4', 'video_resolution': '270p/360p', 'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '18', 'container': 'MP4', 'video_resolution': '360p',
'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p',
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8',
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
#{'itag': '83', 'container': 'MP4', 'video_resolution': '240p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
{'itag': '13', 'container': '3GP', 'video_resolution': '', 'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': ''},
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '36', 'container': '3GP', 'video_resolution': '240p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175', 'audio_encoding': 'AAC', 'audio_bitrate': '36'},
{'itag': '17', 'container': '3GP', 'video_resolution': '144p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05', 'audio_encoding': 'AAC', 'audio_bitrate': '24'},
{'itag': '13', 'container': '3GP', 'video_resolution': '',
'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5',
'audio_encoding': 'AAC', 'audio_bitrate': ''},
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p',
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25',
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
{'itag': '36', 'container': '3GP', 'video_resolution': '240p',
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175',
'audio_encoding': 'AAC', 'audio_bitrate': '32'},
{'itag': '17', 'container': '3GP', 'video_resolution': '144p',
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05',
'audio_encoding': 'AAC', 'audio_bitrate': '24'},
]
def decipher(js, s):
def s_to_sig(js, s):
# Examples:
# - https://www.youtube.com/yts/jsbin/player-da_DK-vflWlK-zq/base.js
# - https://www.youtube.com/yts/jsbin/player-vflvABTsY/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player-vfls4aurX/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player_ias-vfl_RGK2l/en_US/base.js
# - https://www.youtube.com/yts/jsbin/player-vflRjqq_w/da_DK/base.js
# - https://www.youtube.com/yts/jsbin/player_ias-vfl-jbnrr/da_DK/base.js
def tr_js(code):
code = re.sub(r'function', r'def', code)
code = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', code)
@ -52,11 +91,15 @@ class YouTube(VideoExtractor):
return code
js = js.replace('\n', ' ')
f1 = match1(js, r'\w+\.sig\|\|([$\w]+)\(\w+\.\w+\)')
f1 = match1(js, r'\.set\(\w+\.sp,encodeURIComponent\(([$\w]+)') or \
match1(js, r'\.set\(\w+\.sp,\(0,window\.encodeURIComponent\)\(([$\w]+)') or \
match1(js, r'\.set\(\w+\.sp,([$\w]+)\(\w+\.s\)\)') or \
match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)') or \
match1(js, r'=([$\w]+)\(decodeURIComponent\(')
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
f1def = 'function %s%s' % (f1, f1def)
f1def = 'function main_%s%s' % (f1, f1def) # prefix to avoid potential namespace conflict
code = tr_js(f1def)
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
for f2 in f2s:
@ -67,23 +110,33 @@ class YouTube(VideoExtractor):
else:
f2def = re.search(r'[^$\w]%s:function\((\w+)\)(\{[^\{\}]+\})' % f2e, js)
f2def = 'function {}({},b){}'.format(f2e, f2def.group(1), f2def.group(2))
f2 = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', f2)
f2 = re.sub(r'(as|if|in|is|or)', r'_\1', f2)
f2 = re.sub(r'\$', '_dollar', f2)
code = code + 'global %s\n' % f2 + tr_js(f2def)
f1 = re.sub(r'(as|if|in|is|or)', r'_\1', f1)
f1 = re.sub(r'\$', '_dollar', f1)
code = code + 'sig=%s(s)' % f1
code = code + 'sig=main_%s(s)' % f1 # prefix to avoid potential namespace conflict
exec(code, globals(), locals())
return locals()['sig']
def chunk_by_range(url, size):
urls = []
chunk_size = 10485760
start, end = 0, chunk_size - 1
urls.append('%s&range=%s-%s' % (url, start, end))
while end + 1 < size: # processed size < expected size
start, end = end + 1, end + chunk_size
urls.append('%s&range=%s-%s' % (url, start, end))
return urls
def get_url_from_vid(vid):
return 'https://youtu.be/{}'.format(vid)
def get_vid_from_url(url):
"""Extracts video ID from URL.
"""
return match1(url, r'youtu\.be/([^/]+)') or \
return match1(url, r'youtu\.be/([^?/]+)') or \
match1(url, r'youtube\.com/embed/([^/?]+)') or \
match1(url, r'youtube\.com/v/([^/?]+)') or \
match1(url, r'youtube\.com/watch/([^/?]+)') or \
@ -104,31 +157,22 @@ class YouTube(VideoExtractor):
log.wtf('[Failed] Unsupported URL pattern.')
video_page = get_content('https://www.youtube.com/playlist?list=%s' % playlist_id)
from html.parser import HTMLParser
videos = sorted([HTMLParser().unescape(video)
for video in re.findall(r'<a href="(/watch\?[^"]+)"', video_page)
if parse_query_param(video, 'index')],
key=lambda video: parse_query_param(video, 'index'))
ytInitialData = json.loads(match1(video_page, r'window\["ytInitialData"\]\s*=\s*(.+);'))
# Parse browse_ajax page for more videos to load
load_more_href = match1(video_page, r'data-uix-load-more-href="([^"]+)"')
while load_more_href:
browse_ajax = get_content('https://www.youtube.com/%s' % load_more_href)
browse_data = json.loads(browse_ajax)
load_more_widget_html = browse_data['load_more_widget_html']
content_html = browse_data['content_html']
vs = set(re.findall(r'href="(/watch\?[^"]+)"', content_html))
videos += sorted([HTMLParser().unescape(video)
for video in list(vs)
if parse_query_param(video, 'index')])
load_more_href = match1(load_more_widget_html, r'data-uix-load-more-href="([^"]+)"')
tab0 = ytInitialData['contents']['twoColumnBrowseResultsRenderer']['tabs'][0]
itemSection0 = tab0['tabRenderer']['content']['sectionListRenderer']['contents'][0]
playlistVideoList0 = itemSection0['itemSectionRenderer']['contents'][0]
videos = playlistVideoList0['playlistVideoListRenderer']['contents']
self.title = re.search(r'<meta name="title" content="([^"]+)"', video_page).group(1)
self.p_playlist()
for video in videos:
vid = parse_query_param(video, 'v')
index = parse_query_param(video, 'index')
for index, video in enumerate(videos, 1):
vid = video['playlistVideoRenderer']['videoId']
try:
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
except:
pass
# FIXME: show DASH stream sizes (by default) for playlist videos
def prepare(self, **kwargs):
assert self.url or self.vid
@ -140,22 +184,50 @@ class YouTube(VideoExtractor):
self.download_playlist_by_url(self.url, **kwargs)
exit(0)
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}'.format(self.vid)))
if re.search('\Wlist=', self.url) and not kwargs.get('playlist'):
log.w('This video is from a playlist. (use --playlist to download all videos in the playlist.)')
# Get video info
# 'eurl' is a magic parameter that can bypass age restriction
# full form: 'eurl=https%3A%2F%2Fyoutube.googleapis.com%2Fv%2F{VIDEO_ID}'
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}&eurl=https%3A%2F%2Fy'.format(self.vid)))
logging.debug('STATUS: %s' % video_info['status'][0])
ytplayer_config = None
if 'status' not in video_info:
log.wtf('[Failed] Unknown status.')
log.wtf('[Failed] Unknown status.', exit_code=None)
raise
elif video_info['status'] == ['ok']:
if 'use_cipher_signature' not in video_info or video_info['use_cipher_signature'] == ['False']:
self.title = parse.unquote_plus(video_info['title'][0])
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
self.title = parse.unquote_plus(json.loads(video_info["player_response"][0])["videoDetails"]["title"])
# Parse video page (for DASH)
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
try:
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
self.html5player = 'https:' + ytplayer_config['assets']['js']
# Workaround: get_video_info returns bad s. Why?
if 'url_encoded_fmt_stream_map' not in ytplayer_config['args']:
stream_list = json.loads(ytplayer_config['args']['player_response'])['streamingData']['formats']
else:
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
#stream_list = ytplayer_config['args']['adaptive_fmts'].split(',')
if 'assets' in ytplayer_config:
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
elif re.search('([^"]*/base\.js)"', video_page):
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
self.html5player = self.html5player.replace('\/', '/') # unescape URL
else:
self.html5player = None
except:
if 'url_encoded_fmt_stream_map' not in video_info:
stream_list = json.loads(video_info['player_response'][0])['streamingData']['formats']
else:
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
if re.search('([^"]*/base\.js)"', video_page):
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
else:
self.html5player = None
else:
@ -163,40 +235,78 @@ class YouTube(VideoExtractor):
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
self.title = ytplayer_config['args']['title']
self.html5player = 'https:' + ytplayer_config['assets']['js']
self.title = json.loads(ytplayer_config["args"]["player_response"])["videoDetails"]["title"]
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
elif video_info['status'] == ['fail']:
logging.debug('ERRORCODE: %s' % video_info['errorcode'][0])
if video_info['errorcode'] == ['150']:
# FIXME: still relevant?
if cookies:
# Load necessary cookies into headers (for age-restricted videos)
consent, ssid, hsid, sid = 'YES', '', '', ''
for cookie in cookies:
if cookie.domain.endswith('.youtube.com'):
if cookie.name == 'SSID':
ssid = cookie.value
elif cookie.name == 'HSID':
hsid = cookie.value
elif cookie.name == 'SID':
sid = cookie.value
cookie_str = 'CONSENT=%s; SSID=%s; HSID=%s; SID=%s' % (consent, ssid, hsid, sid)
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid,
headers={'Cookie': cookie_str})
else:
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
try:
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+});ytplayer', video_page).group(1))
except:
msg = re.search('class="message">([^<]+)<', video_page).group(1)
log.wtf('[Failed] "%s"' % msg.strip())
log.wtf('[Failed] Got message "%s". Try to login with --cookies.' % msg.strip())
if 'title' in ytplayer_config['args']:
# 150 Restricted from playback on certain sites
# Parse video page instead
self.title = ytplayer_config['args']['title']
self.html5player = 'https:' + ytplayer_config['assets']['js']
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
else:
log.wtf('[Error] The uploader has not made this video available in your country.')
log.wtf('[Error] The uploader has not made this video available in your country.', exit_code=None)
raise
#self.title = re.search('<meta name="title" content="([^"]+)"', video_page).group(1)
#stream_list = []
elif video_info['errorcode'] == ['100']:
log.wtf('[Failed] This video does not exist.', exit_code=int(video_info['errorcode'][0]))
log.wtf('[Failed] This video does not exist.', exit_code=None) #int(video_info['errorcode'][0])
raise
else:
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=int(video_info['errorcode'][0]))
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=None) #int(video_info['errorcode'][0])
raise
else:
log.wtf('[Failed] Invalid status.')
log.wtf('[Failed] Invalid status.', exit_code=None)
raise
# YouTube Live
if ytplayer_config and (ytplayer_config['args'].get('livestream') == '1' or ytplayer_config['args'].get('live_playback') == '1'):
if 'hlsvp' in ytplayer_config['args']:
hlsvp = ytplayer_config['args']['hlsvp']
else:
player_response= json.loads(ytplayer_config['args']['player_response'])
log.e('[Failed] %s' % player_response['playabilityStatus']['reason'], exit_code=1)
if 'info_only' in kwargs and kwargs['info_only']:
return
else:
download_url_ffmpeg(hlsvp, self.title, 'mp4')
exit(0)
for stream in stream_list:
if isinstance(stream, str):
metadata = parse.parse_qs(stream)
stream_itag = metadata['itag'][0]
self.streams[stream_itag] = {
@ -204,22 +314,34 @@ class YouTube(VideoExtractor):
'url': metadata['url'][0],
'sig': metadata['sig'][0] if 'sig' in metadata else None,
's': metadata['s'][0] if 's' in metadata else None,
'quality': metadata['quality'][0],
'quality': metadata['quality'][0] if 'quality' in metadata else None,
#'quality': metadata['quality_label'][0] if 'quality_label' in metadata else None,
'type': metadata['type'][0],
'mime': metadata['type'][0].split(';')[0],
'container': mime_to_container(metadata['type'][0].split(';')[0]),
}
else:
stream_itag = str(stream['itag'])
self.streams[stream_itag] = {
'itag': str(stream['itag']),
'url': stream['url'] if 'url' in stream else None,
'sig': None,
's': None,
'quality': stream['quality'],
'type': stream['mimeType'],
'mime': stream['mimeType'].split(';')[0],
'container': mime_to_container(stream['mimeType'].split(';')[0]),
}
if 'signatureCipher' in stream:
self.streams[stream_itag].update(dict([(_.split('=')[0], parse.unquote(_.split('=')[1]))
for _ in stream['signatureCipher'].split('&')]))
# Prepare caption tracks
try:
caption_tracks = ytplayer_config['args']['caption_tracks'].split(',')
caption_tracks = json.loads(ytplayer_config['args']['player_response'])['captions']['playerCaptionsTracklistRenderer']['captionTracks']
for ct in caption_tracks:
lang = None
for i in ct.split('&'):
[k, v] = i.split('=')
if k == 'lc' and lang is None: lang = v
if k == 'v' and v[0] != '.': lang = v # auto-generated
if k == 'u': ttsurl = parse.unquote_plus(v)
ttsurl, lang = ct['baseUrl'], ct['languageCode']
tts_xml = parseString(get_content(ttsurl))
transcript = tts_xml.getElementsByTagName('transcript')[0]
texts = transcript.getElementsByTagName('text')
@ -245,7 +367,7 @@ class YouTube(VideoExtractor):
self.caption_tracks[lang] = srt
except: pass
# Prepare DASH streams
# Prepare DASH streams (NOTE: not every video has DASH streams!)
try:
dashmpd = ytplayer_config['args']['dashmpd']
dash_xml = parseString(get_content(dashmpd))
@ -256,11 +378,17 @@ class YouTube(VideoExtractor):
burls = rep.getElementsByTagName('BaseURL')
dash_mp4_a_url = burls[0].firstChild.nodeValue
dash_mp4_a_size = burls[0].getAttribute('yt:contentLength')
if not dash_mp4_a_size:
try: dash_mp4_a_size = url_size(dash_mp4_a_url)
except: continue
elif mimeType == 'audio/webm':
rep = aset.getElementsByTagName('Representation')[-1]
burls = rep.getElementsByTagName('BaseURL')
dash_webm_a_url = burls[0].firstChild.nodeValue
dash_webm_a_size = burls[0].getAttribute('yt:contentLength')
if not dash_webm_a_size:
try: dash_webm_a_size = url_size(dash_webm_a_url)
except: continue
elif mimeType == 'video/mp4':
for rep in aset.getElementsByTagName('Representation'):
w = int(rep.getAttribute('width'))
@ -269,13 +397,18 @@ class YouTube(VideoExtractor):
burls = rep.getElementsByTagName('BaseURL')
dash_url = burls[0].firstChild.nodeValue
dash_size = burls[0].getAttribute('yt:contentLength')
if not dash_size:
try: dash_size = url_size(dash_url)
except: continue
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
self.dash_streams[itag] = {
'quality': '%sx%s' % (w, h),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'mp4',
'src': [dash_url, dash_mp4_a_url],
'src': [dash_urls, dash_mp4_a_urls],
'size': int(dash_size) + int(dash_mp4_a_size)
}
elif mimeType == 'video/webm':
@ -286,36 +419,85 @@ class YouTube(VideoExtractor):
burls = rep.getElementsByTagName('BaseURL')
dash_url = burls[0].firstChild.nodeValue
dash_size = burls[0].getAttribute('yt:contentLength')
if not dash_size:
try: dash_size = url_size(dash_url)
except: continue
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_webm_a_urls = self.__class__.chunk_by_range(dash_webm_a_url, int(dash_webm_a_size))
self.dash_streams[itag] = {
'quality': '%sx%s' % (w, h),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'webm',
'src': [dash_url, dash_webm_a_url],
'src': [dash_urls, dash_webm_a_urls],
'size': int(dash_size) + int(dash_webm_a_size)
}
except:
# VEVO
if not self.html5player: return
self.html5player = self.html5player.replace('\/', '/') # unescape URL (for age-restricted videos)
self.js = get_content(self.html5player)
if 'adaptive_fmts' in ytplayer_config['args']:
try:
# Video info from video page (not always available)
streams = [dict([(i.split('=')[0],
parse.unquote(i.split('=')[1]))
for i in afmt.split('&')])
for afmt in ytplayer_config['args']['adaptive_fmts'].split(',')]
except:
if 'adaptive_fmts' in video_info:
streams = [dict([(i.split('=')[0],
parse.unquote(i.split('=')[1]))
for i in afmt.split('&')])
for afmt in video_info['adaptive_fmts'][0].split(',')]
else:
try:
streams = json.loads(video_info['player_response'][0])['streamingData']['adaptiveFormats']
except: # no DASH stream at all
return
# streams without contentLength got broken urls, just remove them (#2767)
streams = [stream for stream in streams if 'contentLength' in stream]
for stream in streams:
stream['itag'] = str(stream['itag'])
if 'qualityLabel' in stream:
stream['quality_label'] = stream['qualityLabel']
del stream['qualityLabel']
if 'width' in stream:
stream['size'] = '{}x{}'.format(stream['width'], stream['height'])
del stream['width']
del stream['height']
stream['type'] = stream['mimeType']
stream['clen'] = stream['contentLength']
stream['init'] = '{}-{}'.format(
stream['initRange']['start'],
stream['initRange']['end'])
stream['index'] = '{}-{}'.format(
stream['indexRange']['start'],
stream['indexRange']['end'])
del stream['mimeType']
del stream['contentLength']
del stream['initRange']
del stream['indexRange']
if 'signatureCipher' in stream:
stream.update(dict([(_.split('=')[0], parse.unquote(_.split('=')[1]))
for _ in stream['signatureCipher'].split('&')]))
del stream['signatureCipher']
for stream in streams: # get over speed limiting
stream['url'] += '&ratebypass=yes'
for stream in streams: # audio
if stream['type'].startswith('audio/mp4'):
dash_mp4_a_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_mp4_a_url += '&signature={}'.format(sig)
sig = self.__class__.s_to_sig(self.js, stream['s'])
dash_mp4_a_url += '&sig={}'.format(sig)
dash_mp4_a_size = stream['clen']
elif stream['type'].startswith('audio/webm'):
dash_webm_a_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_webm_a_url += '&signature={}'.format(sig)
sig = self.__class__.s_to_sig(self.js, stream['s'])
dash_webm_a_url += '&sig={}'.format(sig)
dash_webm_a_size = stream['clen']
for stream in streams: # video
if 'size' in stream:
@ -323,35 +505,47 @@ class YouTube(VideoExtractor):
mimeType = 'video/mp4'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_url += '&signature={}'.format(sig)
sig = self.__class__.s_to_sig(self.js, stream['s'])
dash_url += '&sig={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
self.dash_streams[itag] = {
'quality': stream['size'],
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'mp4',
'src': [dash_url, dash_mp4_a_url],
'src': [dash_urls, dash_mp4_a_urls],
'size': int(dash_size) + int(dash_mp4_a_size)
}
elif stream['type'].startswith('video/webm'):
mimeType = 'video/webm'
dash_url = stream['url']
if 's' in stream:
sig = self.__class__.decipher(self.js, stream['s'])
dash_url += '&signature={}'.format(sig)
sig = self.__class__.s_to_sig(self.js, stream['s'])
dash_url += '&sig={}'.format(sig)
dash_size = stream['clen']
itag = stream['itag']
audio_url = None
audio_size = None
try:
audio_url = dash_webm_a_url
audio_size = int(dash_webm_a_size)
except UnboundLocalError as e:
audio_url = dash_mp4_a_url
audio_size = int(dash_mp4_a_size)
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
audio_urls = self.__class__.chunk_by_range(audio_url, int(audio_size))
self.dash_streams[itag] = {
'quality': stream['size'],
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
'itag': itag,
'type': mimeType,
'mime': mimeType,
'container': 'webm',
'src': [dash_url, dash_webm_a_url],
'size': int(dash_size) + int(dash_webm_a_size)
'src': [dash_urls, audio_urls],
'size': int(dash_size) + int(audio_size)
}
def extract(self, **kwargs):
@ -374,13 +568,13 @@ class YouTube(VideoExtractor):
src = self.streams[stream_id]['url']
if self.streams[stream_id]['sig'] is not None:
sig = self.streams[stream_id]['sig']
src += '&signature={}'.format(sig)
src += '&sig={}'.format(sig)
elif self.streams[stream_id]['s'] is not None:
if not hasattr(self, 'js'):
self.js = get_content(self.html5player)
s = self.streams[stream_id]['s']
sig = self.__class__.decipher(self.js, s)
src += '&signature={}'.format(sig)
sig = self.__class__.s_to_sig(self.js, s)
src += '&sig={}'.format(sig)
self.streams[stream_id]['src'] = [src]
self.streams[stream_id]['size'] = urls_size(self.streams[stream_id]['src'])

View File

@ -3,73 +3,50 @@
__all__ = ['zhanqi_download']
from ..common import *
import re
import base64
import json
import time
import hashlib
import base64
from urllib.parse import urlparse
def zhanqi_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
html = get_content(url)
video_type_patt = r'VideoType":"([^"]+)"'
video_type = match1(html, video_type_patt)
path = urlparse(url).path[1:]
#rtmp_base_patt = r'VideoUrl":"([^"]+)"'
rtmp_id_patt = r'videoId":"([^"]+)"'
vod_m3u8_id_patt = r'VideoID":"([^"]+)"'
title_patt = r'<p class="title-name" title="[^"]+">([^<]+)</p>'
title_patt_backup = r'<title>([^<]{1,9999})</title>'
title = match1(html, title_patt) or match1(html, title_patt_backup)
title = unescape_html(title)
rtmp_base = "http://wshdl.load.cdn.zhanqi.tv/zqlive"
vod_base = "http://dlvod.cdn.zhanqi.tv"
rtmp_real_base = "rtmp://dlrtmp.cdn.zhanqi.tv/zqlive/"
room_info = "http://www.zhanqi.tv/api/static/live.roomid/"
KEY_MASK = "#{&..?!("
ak2_pattern = r'ak2":"\d-([^|]+)'
if not (path.startswith('videos') or path.startswith('v2/videos')): #url = "https://www.zhanqi.tv/huashan?param_s=1_0.2.0"
path_list = path.split('/')
room_id = path_list[1] if path_list[0] == 'topic' else path_list[0]
zhanqi_live(room_id, merge=merge, output_dir=output_dir, info_only=info_only, **kwargs)
else: #url = 'https://www.zhanqi.tv/videos/Lyingman/2017/01/182308.html'
# https://www.zhanqi.tv/v2/videos/215593.html
video_id = path.split('.')[0].split('/')[-1]
zhanqi_video(video_id, merge=merge, output_dir=output_dir, info_only=info_only, **kwargs)
if video_type == "LIVE":
rtmp_id = match1(html, rtmp_id_patt).replace('\\/','/')
#request_url = rtmp_base+'/'+rtmp_id+'.flv?get_url=1'
#real_url = get_html(request_url)
html2 = get_content(room_info + rtmp_id.split("_")[0] + ".json")
json_data = json.loads(html2)
cdns = json_data["data"]["flashvars"]["cdns"]
cdns = base64.b64decode(cdns).decode("utf-8")
cdn = match1(cdns, ak2_pattern)
cdn = base64.b64decode(cdn).decode("utf-8")
key = ''
i = 0
while(i < len(cdn)):
key = key + chr(ord(cdn[i]) ^ ord(KEY_MASK[i % 8]))
i = i + 1
time_hex = hex(int(time.time()))[2:]
key = hashlib.md5(bytes(key + "/zqlive/" + rtmp_id + time_hex, "utf-8")).hexdigest()
real_url = rtmp_real_base + '/' + rtmp_id + "?k=" + key + "&t=" + time_hex
print_info(site_info, title, 'flv', float('inf'))
def zhanqi_live(room_id, merge=True, output_dir='.', info_only=False, **kwargs):
api_url = "https://www.zhanqi.tv/api/static/v2.1/room/domain/{}.json".format(room_id)
json_data = json.loads(get_content(api_url))['data']
status = json_data['status']
if status != '4':
raise Exception("The live stream is not online!")
nickname = json_data['nickname']
title = nickname + ": " + json_data['title']
video_levels = base64.b64decode(json_data['flashvars']['VideoLevels']).decode('utf8')
m3u8_url = json.loads(video_levels)['streamUrl']
print_info(site_info, title, 'm3u8', 0, m3u8_url=m3u8_url, m3u8_type='master')
if not info_only:
download_rtmp_url(real_url, title, 'flv', {}, output_dir, merge = merge)
#download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
elif video_type == "VOD":
vod_m3u8_request = vod_base + match1(html, vod_m3u8_id_patt).replace('\\/','/')
vod_m3u8 = get_html(vod_m3u8_request)
part_url = re.findall(r'(/[^#]+)\.ts',vod_m3u8)
real_url = []
for i in part_url:
i = vod_base + i + ".ts"
real_url.append(i)
type_ = ''
size = 0
for url in real_url:
_, type_, temp = url_info(url)
size += temp or 0
download_url_ffmpeg(m3u8_url, title, 'mp4', output_dir=output_dir, merge=merge)
print_info(site_info, title, type_ or 'ts', size)
def zhanqi_video(video_id, output_dir='.', info_only=False, merge=True, **kwargs):
api_url = 'https://www.zhanqi.tv/api/static/v2.1/video/{}.json'.format(video_id)
json_data = json.loads(get_content(api_url))['data']
title = json_data['title']
vid = json_data['flashvars']['VideoID']
m3u8_url = 'http://dlvod.cdn.zhanqi.tv/' + vid
urls = general_m3u8_extractor(m3u8_url)
print_info(site_info, title, 'm3u8', 0)
if not info_only:
download_urls(real_url, title, type_ or 'ts', size, output_dir, merge = merge)
else:
NotImplementedError('Unknown_video_type')
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
site_info = "zhanqi.tv"
site_info = "www.zhanqi.tv"
download = zhanqi_download
download_playlist = playlist_not_supported('zhanqi')

View File

@ -0,0 +1,55 @@
#!/usr/bin/env python
__all__ = ['zhibo_download']
from ..common import *
def zhibo_vedio_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
# http://video.zhibo.tv/video/details/d103057f-663e-11e8-9d83-525400ccac43.html
html = get_html(url)
title = r1(r'<title>([\s\S]*)</title>', html)
total_size = 0
part_urls= []
video_html = r1(r'<script type="text/javascript">([\s\S]*)</script></head>', html)
# video_guessulike = r1(r"window.xgData =([s\S'\s\.]*)\'\;[\s\S]*window.vouchData", video_html)
video_url = r1(r"window.vurl = \'([s\S'\s\.]*)\'\;[\s\S]*window.imgurl", video_html)
part_urls.append(video_url)
ext = video_url.split('.')[-1]
print_info(site_info, title, ext, total_size)
if not info_only:
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
def zhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
if 'video.zhibo.tv' in url:
zhibo_vedio_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
return
# if 'v.zhibo.tv' in url:
# http://v.zhibo.tv/31609372
html = get_html(url)
title = r1(r'<title>([\s\S]*)</title>', html)
is_live = r1(r"window.videoIsLive=\'([s\S'\s\.]*)\'\;[\s\S]*window.resDomain", html)
if is_live != "1":
raise ValueError("The live stream is not online! (Errno:%s)" % is_live)
match = re.search(r"""
ourStreamName .*?
'(.*?)' .*?
rtmpHighSource .*?
'(.*?)' .*?
'(.*?)'
""", html, re.S | re.X)
real_url = match.group(3) + match.group(1) + match.group(2)
print_info(site_info, title, 'flv', float('inf'))
if not info_only:
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
site_info = "zhibo.tv"
download = zhibo_download
download_playlist = playlist_not_supported('zhibo')

View File

@ -0,0 +1,79 @@
#!/usr/bin/env python
__all__ = ['zhihu_download', 'zhihu_download_playlist']
from ..common import *
import json
def zhihu_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
paths = url.split("/")
# question or column
if len(paths) < 3 and len(paths) < 6:
raise TypeError("URL does not conform to specifications, Support column and question only."
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
"https://www.zhihu.com/question/267782048/answer/490720324")
if ("question" not in paths or "answer" not in paths) and "zhuanlan.zhihu.com" not in paths:
raise TypeError("URL does not conform to specifications, Support column and question only."
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
"https://www.zhihu.com/question/267782048/answer/490720324")
html = get_html(url, faker=True)
title = match1(html, r'data-react-helmet="true">(.*?)</title>')
for index, video_id in enumerate(matchall(html, [r'<a class="video-box" href="\S+video/(\d+)"'])):
try:
video_info = json.loads(
get_content(r"https://lens.zhihu.com/api/videos/{}".format(video_id), headers=fake_headers))
except json.decoder.JSONDecodeError:
log.w("Video id not found:{}".format(video_id))
continue
play_list = video_info["playlist"]
# first High Definition
# second Second Standard Definition
# third ld. What is ld ?
# finally continue
data = play_list.get("hd", play_list.get("sd", play_list.get("ld", None)))
if not data:
log.w("Video id No play address:{}".format(video_id))
continue
print_info(site_info, title, data["format"], data["size"])
if not info_only:
ext = "_{}.{}".format(index, data["format"])
if kwargs.get("zhihu_offset"):
ext = "_{}".format(kwargs["zhihu_offset"]) + ext
download_urls([data["play_url"]], title, ext, data["size"],
output_dir=output_dir, merge=merge, **kwargs)
def zhihu_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
if "question" not in url or "answer" in url: # question page
raise TypeError("URL does not conform to specifications, Support question only."
" Example URL: https://www.zhihu.com/question/267782048")
url = url.split("?")[0]
if url[-1] == "/":
question_id = url.split("/")[-2]
else:
question_id = url.split("/")[-1]
videos_url = r"https://www.zhihu.com/api/v4/questions/{}/answers".format(question_id)
try:
questions = json.loads(get_content(videos_url))
except json.decoder.JSONDecodeError:
raise TypeError("Check whether the problem URL exists.Example URL: https://www.zhihu.com/question/267782048")
count = 0
while 1:
for data in questions["data"]:
kwargs["zhihu_offset"] = count
zhihu_download("https://www.zhihu.com/question/{}/answer/{}".format(question_id, data["id"]),
output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
count += 1
if questions["paging"]["is_end"]:
return
questions = json.loads(get_content(questions["paging"]["next"], headers=fake_headers))
site_info = "zhihu.com"
download = zhihu_download
download_playlist = zhihu_download_playlist

View File

@ -11,8 +11,25 @@ def output(video_extractor, pretty_print=True):
out['title'] = ve.title
out['site'] = ve.name
out['streams'] = ve.streams
try:
if ve.dash_streams:
out['streams'].update(ve.dash_streams)
except AttributeError:
pass
try:
if ve.audiolang:
out['audiolang'] = ve.audiolang
except AttributeError:
pass
extra = {}
if getattr(ve, 'referer', None) is not None:
extra["referer"] = ve.referer
if getattr(ve, 'ua', None) is not None:
extra["ua"] = ve.ua
if extra:
out["extra"] = extra
if pretty_print:
print(json.dumps(out, indent=4, sort_keys=True, ensure_ascii=False))
print(json.dumps(out, indent=4, ensure_ascii=False))
else:
print(json.dumps(out))
@ -31,6 +48,11 @@ def print_info(site_info=None, title=None, type=None, size=None):
def download_urls(urls=None, title=None, ext=None, total_size=None, refer=None):
ve = last_info
if not ve:
ve = VideoExtractor()
ve.name = ''
ve.url = urls
ve.title=title
# save download info in streams
stream = {}
stream['container'] = ext
@ -42,4 +64,3 @@ def download_urls(urls=None, title=None, ext=None, total_size=None, refer=None):
ve.streams = {}
ve.streams['__default__'] = stream
output(ve)

218
src/you_get/processor/ffmpeg.py Normal file → Executable file
View File

@ -1,69 +1,102 @@
#!/usr/bin/env python
import os.path
import logging
import os
import subprocess
import sys
from ..util.strings import parameterize
from ..common import print_more_compatible as print
try:
from subprocess import DEVNULL
except ImportError:
# Python 3.2 or below
import os
import atexit
DEVNULL = os.open(os.devnull, os.O_RDWR)
atexit.register(lambda fd: os.close(fd), DEVNULL)
def get_usable_ffmpeg(cmd):
try:
p = subprocess.Popen([cmd, '-version'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p = subprocess.Popen([cmd, '-version'], stdin=DEVNULL, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
vers = str(out, 'utf-8').split('\n')[0].split()
assert (vers[0] == 'ffmpeg' and vers[2][0] > '0') or (vers[0] == 'avconv')
#if the version is strange like 'N-1234-gd1111', set version to 2.0
try:
version = [int(i) for i in vers[2].split('.')]
v = vers[2][1:] if vers[2][0] == 'n' else vers[2]
version = [int(i) for i in v.split('.')]
except:
version = [1, 0]
return cmd, version
return cmd, 'ffprobe', version
except:
return None
FFMPEG, FFMPEG_VERSION = get_usable_ffmpeg('ffmpeg') or get_usable_ffmpeg('avconv') or (None, None)
FFMPEG, FFPROBE, FFMPEG_VERSION = get_usable_ffmpeg('ffmpeg') or get_usable_ffmpeg('avconv') or (None, None, None)
if logging.getLogger().isEnabledFor(logging.DEBUG):
LOGLEVEL = ['-loglevel', 'info']
STDIN = None
else:
LOGLEVEL = ['-loglevel', 'quiet']
STDIN = DEVNULL
def has_ffmpeg_installed():
return FFMPEG is not None
# Given a list of segments and the output path, generates the concat
# list and returns the path to the concat list.
def generate_concat_list(files, output):
concat_list_path = output + '.txt'
concat_list_dir = os.path.dirname(concat_list_path)
with open(concat_list_path, 'w', encoding='utf-8') as concat_list:
for file in files:
if os.path.isfile(file):
relpath = os.path.relpath(file, start=concat_list_dir)
concat_list.write('file %s\n' % parameterize(relpath))
return concat_list_path
def ffmpeg_concat_av(files, output, ext):
print('Merging video parts... ', end="", flush=True)
params = [FFMPEG] + LOGLEVEL
for file in files:
if os.path.isfile(file): params.extend(['-i', file])
params.extend(['-c', 'copy'])
params.extend(['--', output])
if subprocess.call(params, stdin=STDIN):
print('Merging without re-encode failed.\nTry again re-encoding audio... ', end="", flush=True)
try: os.remove(output)
except FileNotFoundError: pass
params = [FFMPEG] + LOGLEVEL
for file in files:
if os.path.isfile(file): params.extend(['-i', file])
params.extend(['-c:v', 'copy'])
if ext == 'mp4':
params.extend(['-c:a', 'aac'])
elif ext == 'webm':
params.extend(['-c:a', 'vorbis'])
params.extend(['-strict', 'experimental'])
params.append(output)
return subprocess.call(params)
elif ext == 'webm':
params.extend(['-c:a', 'opus'])
params.extend(['--', output])
return subprocess.call(params, stdin=STDIN)
else:
return 0
def ffmpeg_convert_ts_to_mkv(files, output='output.mkv'):
for file in files:
if os.path.isfile(file):
params = [FFMPEG] + LOGLEVEL
params.extend(['-y', '-i', file, output])
subprocess.call(params)
params.extend(['-y', '-i', file])
params.extend(['--', output])
subprocess.call(params, stdin=STDIN)
return
def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
# Use concat demuxer on FFmpeg >= 1.1
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
concat_list = open(output + '.txt', 'w', encoding="utf-8")
for file in files:
if os.path.isfile(file):
concat_list.write("file %s\n" % parameterize(file))
concat_list.close()
params = [FFMPEG] + LOGLEVEL
params.extend(['-f', 'concat', '-safe', '-1', '-y', '-i'])
params.append(output + '.txt')
params += ['-c', 'copy', output]
if subprocess.call(params) == 0:
concat_list = generate_concat_list(files, output)
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
'-i', concat_list, '-c', 'copy']
params.extend(['--', output])
if subprocess.call(params, stdin=STDIN) == 0:
os.remove(output + '.txt')
return True
else:
@ -73,7 +106,7 @@ def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
if os.path.isfile(file):
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
params.extend([file, file + '.mpg'])
subprocess.call(params)
subprocess.call(params, stdin=STDIN)
inputs = [open(file + '.mpg', 'rb') for file in files]
with open(output + '.mpg', 'wb') as o:
@ -83,10 +116,9 @@ def ffmpeg_concat_mp4_to_mpg(files, output='output.mpg'):
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
params.append(output + '.mpg')
params += ['-vcodec', 'copy', '-acodec', 'copy']
params.append(output)
subprocess.call(params)
params.extend(['--', output])
if subprocess.call(params) == 0:
if subprocess.call(params, stdin=STDIN) == 0:
for file in files:
os.remove(file + '.mpg')
os.remove(output + '.mpg')
@ -101,10 +133,11 @@ def ffmpeg_concat_ts_to_mkv(files, output='output.mkv'):
for file in files:
if os.path.isfile(file):
params[-1] += file + '|'
params += ['-f', 'matroska', '-c', 'copy', output]
params += ['-f', 'matroska', '-c', 'copy']
params.extend(['--', output])
try:
if subprocess.call(params) == 0:
if subprocess.call(params, stdin=STDIN) == 0:
return True
else:
return False
@ -115,19 +148,12 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
print('Merging video parts... ', end="", flush=True)
# Use concat demuxer on FFmpeg >= 1.1
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
concat_list = open(output + '.txt', 'w', encoding="utf-8")
for file in files:
if os.path.isfile(file):
# for escaping rules, see:
# https://www.ffmpeg.org/ffmpeg-utils.html#Quoting-and-escaping
concat_list.write("file %s\n" % parameterize(file))
concat_list.close()
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
params.append(output + '.txt')
params += ['-c', 'copy', output]
subprocess.check_call(params)
concat_list = generate_concat_list(files, output)
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
'-i', concat_list, '-c', 'copy',
'-bsf:a', 'aac_adtstoasc']
params.extend(['--', output])
subprocess.check_call(params, stdin=STDIN)
os.remove(output + '.txt')
return True
@ -138,7 +164,7 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
params += ['-map', '0', '-c', 'copy', '-f', 'mpegts', '-bsf:v', 'h264_mp4toannexb']
params.append(file + '.ts')
subprocess.call(params)
subprocess.call(params, stdin=STDIN)
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
params.append('concat:')
@ -147,32 +173,41 @@ def ffmpeg_concat_flv_to_mp4(files, output='output.mp4'):
if os.path.isfile(f):
params[-1] += f + '|'
if FFMPEG == 'avconv':
params += ['-c', 'copy', output]
params += ['-c', 'copy']
else:
params += ['-c', 'copy', '-absf', 'aac_adtstoasc', output]
params += ['-c', 'copy', '-absf', 'aac_adtstoasc']
params.extend(['--', output])
if subprocess.call(params) == 0:
if subprocess.call(params, stdin=STDIN) == 0:
for file in files:
os.remove(file + '.ts')
return True
else:
raise
def ffmpeg_concat_mp3_to_mp3(files, output='output.mp3'):
print('Merging video parts... ', end="", flush=True)
files = 'concat:' + '|'.join(files)
params = [FFMPEG] + LOGLEVEL + ['-y']
params += ['-i', files, '-acodec', 'copy']
params.extend(['--', output])
subprocess.call(params)
return True
def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
print('Merging video parts... ', end="", flush=True)
# Use concat demuxer on FFmpeg >= 1.1
if FFMPEG == 'ffmpeg' and (FFMPEG_VERSION[0] >= 2 or (FFMPEG_VERSION[0] == 1 and FFMPEG_VERSION[1] >= 1)):
concat_list = open(output + '.txt', 'w', encoding="utf-8")
for file in files:
if os.path.isfile(file):
concat_list.write("file %s\n" % parameterize(file))
concat_list.close()
params = [FFMPEG] + LOGLEVEL + ['-f', 'concat', '-safe', '-1', '-y', '-i']
params.append(output + '.txt')
params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc', output]
subprocess.check_call(params)
concat_list = generate_concat_list(files, output)
params = [FFMPEG] + LOGLEVEL + ['-y', '-f', 'concat', '-safe', '-1',
'-i', concat_list, '-c', 'copy',
'-bsf:a', 'aac_adtstoasc']
params.extend(['--', output])
subprocess.check_call(params, stdin=STDIN)
os.remove(output + '.txt')
return True
@ -183,7 +218,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
params += ['-c', 'copy', '-f', 'mpegts', '-bsf:v', 'h264_mp4toannexb']
params.append(file + '.ts')
subprocess.call(params)
subprocess.call(params, stdin=STDIN)
params = [FFMPEG] + LOGLEVEL + ['-y', '-i']
params.append('concat:')
@ -192,19 +227,20 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
if os.path.isfile(f):
params[-1] += f + '|'
if FFMPEG == 'avconv':
params += ['-c', 'copy', output]
params += ['-c', 'copy']
else:
params += ['-c', 'copy', '-absf', 'aac_adtstoasc', output]
params += ['-c', 'copy', '-absf', 'aac_adtstoasc']
params.extend(['--', output])
subprocess.check_call(params)
subprocess.check_call(params, stdin=STDIN)
for file in files:
os.remove(file + '.ts')
return True
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.', stream=True):
"""str, str->True
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
You can basicly download anything with this function
You can basically download anything with this function
but better leave it alone with
"""
output = title + '.' + ext
@ -212,25 +248,25 @@ def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
if not (output_dir == '.'):
output = output_dir + '/' + output
ffmpeg_params = []
#should these exist...
print('Downloading streaming content with FFmpeg, press q to stop recording...')
if stream:
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
else:
ffmpeg_params = [FFMPEG] + ['-y', '-i']
ffmpeg_params.append(files) #not the same here!!!!
if FFMPEG == 'avconv': #who cares?
ffmpeg_params += ['-c', 'copy']
else:
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
if params is not None:
if len(params) > 0:
for k, v in params:
ffmpeg_params.append(k)
ffmpeg_params.append(v)
print('Downloading streaming content with FFmpeg, press q to stop recording...')
ffmpeg_params = [FFMPEG] + ['-y', '-re', '-i']
ffmpeg_params.append(files) #not the same here!!!!
if FFMPEG == 'avconv': #who cares?
ffmpeg_params += ['-c', 'copy', output]
else:
ffmpeg_params += ['-c', 'copy', '-bsf:a', 'aac_adtstoasc']
ffmpeg_params.append(output)
ffmpeg_params.extend(['--', output])
print(' '.join(ffmpeg_params))
@ -244,3 +280,31 @@ def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.'):
pass
return True
def ffmpeg_concat_audio_and_video(files, output, ext):
print('Merging video and audio parts... ', end="", flush=True)
if has_ffmpeg_installed:
params = [FFMPEG] + LOGLEVEL
params.extend(['-f', 'concat'])
params.extend(['-safe', '0']) # https://stackoverflow.com/questions/38996925/ffmpeg-concat-unsafe-file-name
for file in files:
if os.path.isfile(file):
params.extend(['-i', file])
params.extend(['-c:v', 'copy'])
params.extend(['-c:a', 'aac'])
params.extend(['-strict', 'experimental'])
params.extend(['--', output + "." + ext])
return subprocess.call(params, stdin=STDIN)
else:
raise EnvironmentError('No ffmpeg found')
def ffprobe_get_media_duration(file):
print('Getting {} duration'.format(file))
params = [FFPROBE]
params.extend(['-i', file])
params.extend(['-show_entries', 'format=duration'])
params.extend(['-v', 'quiet'])
params.extend(['-of', 'csv=p=0'])
return subprocess.check_output(params, stdin=STDIN, stderr=subprocess.STDOUT).decode().strip()

View File

@ -1,8 +1,8 @@
#!/usr/bin/env python
import platform
from .os import detect_os
def legitimize(text, os=platform.system()):
def legitimize(text, os=detect_os()):
"""Converts a string to a valid filename.
"""
@ -13,7 +13,8 @@ def legitimize(text, os=platform.system()):
ord('|'): '-',
})
if os == 'Windows':
# FIXME: do some filesystem detection
if os == 'windows' or os == 'cygwin' or os == 'wsl':
# Windows (non-POSIX namespace)
text = text.translate({
# Reserved in Windows VFAT and NTFS
@ -28,10 +29,11 @@ def legitimize(text, os=platform.system()):
ord('>'): '-',
ord('['): '(',
ord(']'): ')',
ord('\t'): ' ',
})
else:
# *nix
if os == 'Darwin':
if os == 'mac':
# Mac OS HFS+
text = text.translate({
ord(':'): '-',
@ -41,5 +43,5 @@ def legitimize(text, os=platform.system()):
if text.startswith("."):
text = text[1:]
text = text[:82] # Trim to 82 Unicode characters long
text = text[:80] # Trim to 82 Unicode characters long
return text

View File

@ -5,13 +5,13 @@ from ..version import script_name
import os, sys
IS_ANSI_TERMINAL = os.getenv('TERM') in (
TERM = os.getenv('TERM', '')
IS_ANSI_TERMINAL = TERM in (
'eterm-color',
'linux',
'screen',
'vt100',
'xterm',
)
) or TERM.startswith('xterm')
# ANSI escape code
# See <http://en.wikipedia.org/wiki/ANSI_escape_code>
@ -89,10 +89,14 @@ def e(message, exit_code=None):
"""Print an error log message."""
print_log(message, YELLOW, BOLD)
if exit_code is not None:
exit(exit_code)
sys.exit(exit_code)
def wtf(message, exit_code=1):
"""What a Terrible Failure!"""
print_log(message, RED, BOLD)
if exit_code is not None:
exit(exit_code)
sys.exit(exit_code)
def yes_or_no(message):
ans = str(input('%s (y/N) ' % message)).lower().strip()
return ans == 'y'

32
src/you_get/util/os.py Normal file
View File

@ -0,0 +1,32 @@
#!/usr/bin/env python
from platform import system
def detect_os():
"""Detect operating system.
"""
# Inspired by:
# https://github.com/scivision/pybashutils/blob/78b7f2b339cb03b1c37df94015098bbe462f8526/pybashutils/windows_linux_detect.py
syst = system().lower()
os = 'unknown'
if 'cygwin' in syst:
os = 'cygwin'
elif 'darwin' in syst:
os = 'mac'
elif 'linux' in syst:
os = 'linux'
# detect WSL https://github.com/Microsoft/BashOnWindows/issues/423
try:
with open('/proc/version', 'r') as f:
if 'microsoft' in f.read().lower():
os = 'wsl'
except: pass
elif 'windows' in syst:
os = 'windows'
elif 'bsd' in syst:
os = 'bsd'
return os

Some files were not shown because too many files have changed in this diff Show More