mirror of
https://github.com/soimort/you-get.git
synced 2025-03-13 03:17:44 +03:00
commit
e0648a2ef8
39
.github/ISSUE_TEMPLATE.md
vendored
39
.github/ISSUE_TEMPLATE.md
vendored
@ -1,39 +0,0 @@
|
||||
Please make sure these boxes are checked before submitting your issue – thank you!
|
||||
|
||||
- [ ] You can actually watch the video in your browser or mobile application, but not download them with `you-get`.
|
||||
- [ ] Your `you-get` is up-to-date.
|
||||
- [ ] I have read <https://github.com/soimort/you-get/wiki/FAQ> and tried to do so.
|
||||
- [ ] The issue is not yet reported on <https://github.com/soimort/you-get/issues> or <https://github.com/soimort/you-get/wiki/Known-Bugs>. If so, please add your comments under the existing issue.
|
||||
- [ ] The issue (or question) is really about `you-get`, not about some other code or project.
|
||||
|
||||
Run the command with the `--debug` option, and paste the full output inside the fences:
|
||||
|
||||
```
|
||||
[PASTE IN ME]
|
||||
```
|
||||
|
||||
If there's anything else you would like to say (e.g. in case your issue is not about downloading a specific video; it might as well be a general discussion or proposal for a new feature), fill in the box below; otherwise, you may want to post an emoji or meme instead:
|
||||
|
||||
> [WRITE SOMETHING]
|
||||
> [OR HAVE SOME :icecream:!]
|
||||
|
||||
汉语翻译最终日期:2016年02月26日
|
||||
|
||||
在提交前,请确保您已经检查了以下内容!
|
||||
|
||||
- [ ] 你可以在浏览器或移动端中观看视频,但不能使用`you-get`下载.
|
||||
- [ ] 您的`you-get`为最新版.
|
||||
- [ ] 我已经阅读并按 <https://github.com/soimort/you-get/wiki/FAQ> 中的指引进行了操作.
|
||||
- [ ] 您的问题没有在<https://github.com/soimort/you-get/issues> , <https://github.com/soimort/you-get/wiki/FAQ> 或 <https://github.com/soimort/you-get/wiki/Known-Bugs> 报告,否则请在原有issue下报告.
|
||||
- [ ] 本问题确实关于`you-get`, 而不是其他项目.
|
||||
|
||||
请使用`--debug`运行,并将输出粘贴在下面:
|
||||
|
||||
```
|
||||
[在这里粘贴完整日志]
|
||||
```
|
||||
|
||||
如果您有其他附言,例如问题只在某个视频发生,或者是一般性讨论或者提出新功能,请在下面添加;或者您可以卖个萌:
|
||||
|
||||
> [您的内容]
|
||||
> [舔 :icecream:!]
|
48
.github/PULL_REQUEST_TEMPLATE.md
vendored
48
.github/PULL_REQUEST_TEMPLATE.md
vendored
@ -1,48 +0,0 @@
|
||||
**(PLEASE DELETE ALL THESE AFTER READING)**
|
||||
|
||||
Thank you for the pull request! `you-get` is a growing open source project, which would not have been possible without contributors like you.
|
||||
|
||||
Here are some simple rules to follow, please recheck them before sending the pull request:
|
||||
|
||||
- [ ] If you want to propose two or more unrelated patches, please open separate pull requests for them, instead of one;
|
||||
- [ ] All pull requests should be based upon the latest `develop` branch;
|
||||
- [ ] Name your branch (from which you will send the pull request) properly; use a meaningful name like `add-this-shining-feature` rather than just `develop`;
|
||||
- [ ] All commit messages, as well as comments in code, should be written in understandable English.
|
||||
|
||||
As a contributor, you must be aware that
|
||||
|
||||
- [ ] You agree to contribute your code to this project, under the terms of the MIT license, so that any person may freely use or redistribute them; of course, you will still reserve the copyright for your own authorship.
|
||||
- [ ] You may not contribute any code not authored by yourself, unless they are licensed under either public domain or the MIT license, literally.
|
||||
|
||||
Not all pull requests can eventually be merged. I consider merged / unmerged patches as equally important for the community: as long as you think a patch would be helpful, someone else might find it helpful, too, therefore they could take your fork and benefit in some way. In any case, I would like to thank you in advance for taking your time to contribute to this project.
|
||||
|
||||
Cheers,
|
||||
Mort
|
||||
|
||||
**(PLEASE REPLACE ALL ABOVE WITH A DETAILED DESCRIPTION OF YOUR PULL REQUEST)**
|
||||
|
||||
|
||||
汉语翻译最后日期:2016年02月26日
|
||||
|
||||
**(阅读后请删除所有内容)**
|
||||
|
||||
感谢您的pull request! `you-get`是稳健成长的开源项目,感谢您的贡献.
|
||||
|
||||
以下简单检查项目望您复查:
|
||||
|
||||
- [ ] 如果您预计提出两个或更多不相关补丁,请为每个使用不同的pull requests,而不是单一;
|
||||
- [ ] 所有的pull requests应基于最新的`develop`分支;
|
||||
- [ ] 您预计提出pull requests的分支应有有意义名称,例如`add-this-shining-feature`而不是`develop`;
|
||||
- [ ] 所有的提交信息与代码中注释应使用可理解的英语.
|
||||
|
||||
作为贡献者,您需要知悉
|
||||
|
||||
- [ ] 您同意在MIT协议下贡献代码,以便任何人自由使用或分发;当然,你仍旧保留代码的著作权
|
||||
- [ ] 你不得贡献非自己编写的代码,除非其属于公有领域或使用MIT协议.
|
||||
|
||||
不是所有的pull requests都会被合并,然而我认为合并/不合并的补丁一样重要:如果您认为补丁重要,其他人也有可能这么认为,那么他们可以从你的fork中提取工作并获益。无论如何,感谢您费心对本项目贡献.
|
||||
|
||||
祝好,
|
||||
Mort
|
||||
|
||||
**(请将本内容完整替换为PULL REQUEST的详细内容)**
|
16
.travis.yml
16
.travis.yml
@ -1,15 +1,23 @@
|
||||
# https://travis-ci.org/soimort/you-get
|
||||
language: python
|
||||
python:
|
||||
- "3.2"
|
||||
- "3.3"
|
||||
- "3.4"
|
||||
- "3.5"
|
||||
- "3.6"
|
||||
- "nightly"
|
||||
- "pypy3"
|
||||
matrix:
|
||||
include:
|
||||
- python: "3.7"
|
||||
dist: xenial
|
||||
- python: "3.8-dev"
|
||||
dist: xenial
|
||||
- python: "nightly"
|
||||
dist: xenial
|
||||
before_install:
|
||||
- pip install flake8
|
||||
before_script:
|
||||
- flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics
|
||||
script: make test
|
||||
sudo: false
|
||||
notifications:
|
||||
webhooks:
|
||||
urls:
|
||||
|
@ -1,27 +1,27 @@
|
||||
# How to Contribute
|
||||
# How to Report an Issue
|
||||
|
||||
`you-get` is currently experimenting with an aggressive approach to handling issues. Namely, a bug report must be addressed with some code via a pull request.
|
||||
If you would like to report a problem you find when using `you-get`, please open a [Pull Request](https://github.com/soimort/you-get/pulls), which should include:
|
||||
|
||||
## Report a broken extractor
|
||||
1. A detailed description of the encountered problem;
|
||||
2. At least one commit, addressing the problem through some unit test(s).
|
||||
* Examples of good commits: [#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
|
||||
|
||||
**How-To:** Please open a new pull request with the following changes:
|
||||
PRs that fail to meet the above criteria may be closed summarily with no further action.
|
||||
|
||||
* Add a new test case in [tests/test.py](https://github.com/soimort/you-get/blob/develop/tests/test.py), with the failing URL(s).
|
||||
A valid PR will remain open until its addressed problem is fixed.
|
||||
|
||||
The Travis CI build will (ideally) fail showing a :x:, which means you have successfully reported a broken extractor.
|
||||
|
||||
Such a valid PR will be either *closed* if it's fixed by another PR, or *merged* if it's fixed by follow-up commits from the reporter himself/herself.
|
||||
|
||||
## Report other issues / Suggest a new feature
|
||||
# 如何汇报问题
|
||||
|
||||
**How-To:** Please open a pull request with the proposed changes directly.
|
||||
为了防止对 GitHub Issues 的滥用,本项目不接受一般的 Issue。
|
||||
|
||||
A valid PR need not be complete (i.e., can be WIP), but it should contain at least one sensible, nontrivial commit.
|
||||
如您在使用 `you-get` 的过程中发现任何问题,请开启一个 [Pull Request](https://github.com/soimort/you-get/pulls)。该 PR 应当包含:
|
||||
|
||||
## Hints
|
||||
1. 详细的问题描述;
|
||||
2. 至少一个 commit,其内容是**与问题相关的**单元测试。**不要通过随意修改无关文件的方式来提交 PR!**
|
||||
* 有效的 commit 示例:[#2675](https://github.com/soimort/you-get/pull/2675/files), [#2680](https://github.com/soimort/you-get/pull/2680/files), [#2685](https://github.com/soimort/you-get/pull/2685/files)
|
||||
|
||||
* The [`develop`](https://github.com/soimort/you-get/tree/develop) branch is where your pull request goes.
|
||||
* Remember to rebase.
|
||||
* Document your PR clearly, and if applicable, provide some sample links for reviewers to test with.
|
||||
* Write well-formatted, easy-to-understand commit messages. If you don't know how, look at existing ones.
|
||||
* We will not ask you to sign a CLA, but you must assure that your code can be legally redistributed (under the terms of the MIT license).
|
||||
不符合以上条件的 PR 可能被直接关闭。
|
||||
|
||||
有效的 PR 将会被一直保留,直至相应的问题得以修复。
|
||||
|
21
LICENSE.txt
21
LICENSE.txt
@ -1,15 +1,14 @@
|
||||
==============================================
|
||||
This is a copy of the MIT license.
|
||||
==============================================
|
||||
Copyright (C) 2012-2017 Mort Yao <mort.yao@gmail.com>
|
||||
Copyright (C) 2012 Boyu Guo <iambus@gmail.com>
|
||||
MIT License
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
||||
this software and associated documentation files (the "Software"), to deal in
|
||||
the Software without restriction, including without limitation the rights to
|
||||
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
||||
of the Software, and to permit persons to whom the Software is furnished to do
|
||||
so, subject to the following conditions:
|
||||
Copyright (c) 2012-2019 Mort Yao <mort.yao@gmail.com>
|
||||
Copyright (c) 2012 Boyu Guo <iambus@gmail.com>
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
117
README.md
117
README.md
@ -4,6 +4,10 @@
|
||||
[](https://travis-ci.org/soimort/you-get)
|
||||
[](https://gitter.im/soimort/you-get?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||
|
||||
**NOTICE: Read [this](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md) if you are looking for the conventional "Issues" tab.**
|
||||
|
||||
---
|
||||
|
||||
[You-Get](https://you-get.org/) is a tiny command-line utility to download media contents (videos, audios, images) from the Web, in case there is no other handy way to do it.
|
||||
|
||||
Here's how you use `you-get` to download a video from [YouTube](https://www.youtube.com/watch?v=jNQXAC9IVRw):
|
||||
@ -49,10 +53,10 @@ Are you a Python programmer? Then check out [the source](https://github.com/soim
|
||||
|
||||
### Prerequisites
|
||||
|
||||
The following dependencies are required and must be installed separately, unless you are using a pre-built package or chocolatey on Windows:
|
||||
The following dependencies are necessary:
|
||||
|
||||
* **[Python 3](https://www.python.org/downloads/)**
|
||||
* **[FFmpeg](https://www.ffmpeg.org/)** (strongly recommended) or [Libav](https://libav.org/)
|
||||
* **[Python](https://www.python.org/downloads/)** 3.2 or above
|
||||
* **[FFmpeg](https://www.ffmpeg.org/)** 1.0 or above
|
||||
* (Optional) [RTMPDump](https://rtmpdump.mplayerhq.hu/)
|
||||
|
||||
### Option 1: Install via pip
|
||||
@ -61,17 +65,13 @@ The official release of `you-get` is distributed on [PyPI](https://pypi.python.o
|
||||
|
||||
$ pip3 install you-get
|
||||
|
||||
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen)
|
||||
### Option 2: Install via [Antigen](https://github.com/zsh-users/antigen) (for Zsh users)
|
||||
|
||||
Add the following line to your `.zshrc`:
|
||||
|
||||
antigen bundle soimort/you-get
|
||||
|
||||
### Option 3: Use a pre-built package (Windows only)
|
||||
|
||||
Download the `exe` (standalone) or `7z` (all dependencies included) from: <https://github.com/soimort/you-get/releases/latest>.
|
||||
|
||||
### Option 4: Download from GitHub
|
||||
### Option 3: Download from GitHub
|
||||
|
||||
You may either download the [stable](https://github.com/soimort/you-get/archive/master.zip) (identical with the latest release on PyPI) or the [develop](https://github.com/soimort/you-get/archive/develop.zip) (more hotfixes, unstable features) branch of `you-get`. Unzip it, and put the directory containing the `you-get` script into your `PATH`.
|
||||
|
||||
@ -89,7 +89,7 @@ $ python3 setup.py install --user
|
||||
|
||||
to install `you-get` to a permanent path.
|
||||
|
||||
### Option 5: Git clone
|
||||
### Option 4: Git clone
|
||||
|
||||
This is the recommended way for all developers, even if you don't often code in Python.
|
||||
|
||||
@ -99,13 +99,7 @@ $ git clone git://github.com/soimort/you-get.git
|
||||
|
||||
Then put the cloned directory into your `PATH`, or run `./setup.py install` to install `you-get` to a permanent path.
|
||||
|
||||
### Option 6: Using [Chocolatey](https://chocolatey.org/) (Windows only)
|
||||
|
||||
```
|
||||
> choco install you-get
|
||||
```
|
||||
|
||||
### Option 7: Homebrew (Mac only)
|
||||
### Option 5: Homebrew (Mac only)
|
||||
|
||||
You can install `you-get` easily via:
|
||||
|
||||
@ -113,6 +107,14 @@ You can install `you-get` easily via:
|
||||
$ brew install you-get
|
||||
```
|
||||
|
||||
### Option 6: pkg (FreeBSD only)
|
||||
|
||||
You can install `you-get` easily via:
|
||||
|
||||
```
|
||||
# pkg install you-get
|
||||
```
|
||||
|
||||
### Shell completion
|
||||
|
||||
Completion definitions for Bash, Fish and Zsh can be found in [`contrib/completion`](https://github.com/soimort/you-get/tree/develop/contrib/completion). Please consult your shell's manual for how to take advantage of them.
|
||||
@ -131,12 +133,6 @@ or download the latest release via:
|
||||
$ you-get https://github.com/soimort/you-get/archive/master.zip
|
||||
```
|
||||
|
||||
or use [chocolatey package manager](https://chocolatey.org):
|
||||
|
||||
```
|
||||
> choco upgrade you-get
|
||||
```
|
||||
|
||||
In order to get the latest ```develop``` branch without messing up the PIP, you can try:
|
||||
|
||||
```
|
||||
@ -154,22 +150,54 @@ $ you-get -i 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
site: YouTube
|
||||
title: Me at the zoo
|
||||
streams: # Available quality and codecs
|
||||
[ DASH ] ____________________________________
|
||||
- itag: 242
|
||||
container: webm
|
||||
quality: 320x240
|
||||
size: 0.6 MiB (618358 bytes)
|
||||
# download-with: you-get --itag=242 [URL]
|
||||
|
||||
- itag: 395
|
||||
container: mp4
|
||||
quality: 320x240
|
||||
size: 0.5 MiB (550743 bytes)
|
||||
# download-with: you-get --itag=395 [URL]
|
||||
|
||||
- itag: 133
|
||||
container: mp4
|
||||
quality: 320x240
|
||||
size: 0.5 MiB (498558 bytes)
|
||||
# download-with: you-get --itag=133 [URL]
|
||||
|
||||
- itag: 278
|
||||
container: webm
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (392857 bytes)
|
||||
# download-with: you-get --itag=278 [URL]
|
||||
|
||||
- itag: 160
|
||||
container: mp4
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (370882 bytes)
|
||||
# download-with: you-get --itag=160 [URL]
|
||||
|
||||
- itag: 394
|
||||
container: mp4
|
||||
quality: 192x144
|
||||
size: 0.4 MiB (367261 bytes)
|
||||
# download-with: you-get --itag=394 [URL]
|
||||
|
||||
[ DEFAULT ] _________________________________
|
||||
- itag: 43
|
||||
container: webm
|
||||
quality: medium
|
||||
size: 0.5 MiB (564215 bytes)
|
||||
size: 0.5 MiB (568748 bytes)
|
||||
# download-with: you-get --itag=43 [URL]
|
||||
|
||||
- itag: 18
|
||||
container: mp4
|
||||
quality: medium
|
||||
# download-with: you-get --itag=18 [URL]
|
||||
|
||||
- itag: 5
|
||||
container: flv
|
||||
quality: small
|
||||
# download-with: you-get --itag=5 [URL]
|
||||
# download-with: you-get --itag=18 [URL]
|
||||
|
||||
- itag: 36
|
||||
container: 3gp
|
||||
@ -182,23 +210,24 @@ streams: # Available quality and codecs
|
||||
# download-with: you-get --itag=17 [URL]
|
||||
```
|
||||
|
||||
The format marked with `DEFAULT` is the one you will get by default. If that looks cool to you, download it:
|
||||
By default, the one on the top is the one you will get. If that looks cool to you, download it:
|
||||
|
||||
```
|
||||
$ you-get 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
site: YouTube
|
||||
title: Me at the zoo
|
||||
stream:
|
||||
- itag: 43
|
||||
- itag: 242
|
||||
container: webm
|
||||
quality: medium
|
||||
size: 0.5 MiB (564215 bytes)
|
||||
# download-with: you-get --itag=43 [URL]
|
||||
quality: 320x240
|
||||
size: 0.6 MiB (618358 bytes)
|
||||
# download-with: you-get --itag=242 [URL]
|
||||
|
||||
Downloading zoo.webm ...
|
||||
100.0% ( 0.5/0.5 MB) ├████████████████████████████████████████┤[1/1] 7 MB/s
|
||||
Downloading Me at the zoo.webm ...
|
||||
100% ( 0.6/ 0.6MB) ├██████████████████████████████████████████████████████████████████████████████┤[2/2] 2 MB/s
|
||||
Merging video parts... Merged into Me at the zoo.webm
|
||||
|
||||
Saving Me at the zoo.en.srt ...Done.
|
||||
Saving Me at the zoo.en.srt ... Done.
|
||||
```
|
||||
|
||||
(If a YouTube video has any closed captions, they will be downloaded together with the video file, in SubRip subtitle format.)
|
||||
@ -298,7 +327,7 @@ However, the system proxy setting (i.e. the environment variable `http_proxy`) i
|
||||
|
||||
### Watch a video
|
||||
|
||||
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mplayer` or `vlc`, instead of downloading it:
|
||||
Use the `--player`/`-p` option to feed the video into your media player of choice, e.g. `mpv` or `vlc`, instead of downloading it:
|
||||
|
||||
```
|
||||
$ you-get -p vlc 'https://www.youtube.com/watch?v=jNQXAC9IVRw'
|
||||
@ -374,11 +403,10 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
||||
| **niconico<br/>ニコニコ動画** | <http://www.nicovideo.jp/> |✓| | |
|
||||
| **163<br/>网易视频<br/>网易云音乐** | <http://v.163.com/><br/><http://music.163.com/> |✓| |✓|
|
||||
| 56网 | <http://www.56.com/> |✓| | |
|
||||
| **AcFun** | <http://www.acfun.tv/> |✓| | |
|
||||
| **AcFun** | <http://www.acfun.cn/> |✓| | |
|
||||
| **Baidu<br/>百度贴吧** | <http://tieba.baidu.com/> |✓|✓| |
|
||||
| 爆米花网 | <http://www.baomihua.com/> |✓| | |
|
||||
| **bilibili<br/>哔哩哔哩** | <http://www.bilibili.com/> |✓| | |
|
||||
| Dilidili | <http://www.dilidili.com/> |✓| | |
|
||||
| 豆瓣 | <http://www.douban.com/> |✓| |✓|
|
||||
| 斗鱼 | <http://www.douyutv.com/> |✓| | |
|
||||
| Panda<br/>熊猫 | <http://www.panda.tv/> |✓| | |
|
||||
@ -407,15 +435,16 @@ Use `--url`/`-u` to get a list of downloadable resource URLs extracted from the
|
||||
| **Youku<br/>优酷** | <http://www.youku.com/> |✓| | |
|
||||
| 战旗TV | <http://www.zhanqi.tv/lives> |✓| | |
|
||||
| 央视网 | <http://www.cntv.cn/> |✓| | |
|
||||
| 花瓣 | <http://huaban.com/> | |✓| |
|
||||
| Naver<br/>네이버 | <http://tvcast.naver.com/> |✓| | |
|
||||
| 芒果TV | <http://www.mgtv.com/> |✓| | |
|
||||
| 火猫TV | <http://www.huomao.com/> |✓| | |
|
||||
| 全民直播 | <http://www.quanmin.tv/> |✓| | |
|
||||
| 阳光宽频网 | <http://www.365yg.com/> |✓| | |
|
||||
| 西瓜视频 | <https://www.ixigua.com/> |✓| | |
|
||||
| 快手 | <https://www.kuaishou.com/> |✓|✓| |
|
||||
| 抖音 | <https://www.douyin.com/> |✓| | |
|
||||
| TikTok | <https://www.tiktok.com/> |✓| | |
|
||||
| 中国体育(TV) | <http://v.zhibo.tv/> </br><http://video.zhibo.tv/> |✓| | |
|
||||
| 知乎 | <https://www.zhihu.com/> |✓| | |
|
||||
|
||||
For all other sites not on the list, the universal extractor will take care of finding and downloading interesting resources from the page.
|
||||
|
||||
@ -423,7 +452,7 @@ For all other sites not on the list, the universal extractor will take care of f
|
||||
|
||||
If something is broken and `you-get` can't get you things you want, don't panic. (Yes, this happens all the time!)
|
||||
|
||||
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report a broken extractor](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md#report-a-broken-extractor).
|
||||
Check if it's already a known problem on <https://github.com/soimort/you-get/wiki/Known-Bugs>. If not, follow the guidelines on [how to report an issue](https://github.com/soimort/you-get/blob/develop/CONTRIBUTING.md).
|
||||
|
||||
## Getting Involved
|
||||
|
||||
|
@ -10,6 +10,7 @@ import socket
|
||||
import locale
|
||||
import logging
|
||||
import argparse
|
||||
import ssl
|
||||
from http import cookiejar
|
||||
from importlib import import_module
|
||||
from urllib import request, parse, error
|
||||
@ -24,6 +25,7 @@ sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')
|
||||
SITES = {
|
||||
'163' : 'netease',
|
||||
'56' : 'w56',
|
||||
'365yg' : 'toutiao',
|
||||
'acfun' : 'acfun',
|
||||
'archive' : 'archive',
|
||||
'baidu' : 'baidu',
|
||||
@ -36,13 +38,11 @@ SITES = {
|
||||
'cbs' : 'cbs',
|
||||
'coub' : 'coub',
|
||||
'dailymotion' : 'dailymotion',
|
||||
'dilidili' : 'dilidili',
|
||||
'douban' : 'douban',
|
||||
'douyin' : 'douyin',
|
||||
'douyu' : 'douyutv',
|
||||
'ehow' : 'ehow',
|
||||
'facebook' : 'facebook',
|
||||
'fantasy' : 'fantasy',
|
||||
'fc2' : 'fc2video',
|
||||
'flickr' : 'flickr',
|
||||
'freesound' : 'freesound',
|
||||
@ -50,7 +50,6 @@ SITES = {
|
||||
'google' : 'google',
|
||||
'giphy' : 'giphy',
|
||||
'heavy-music' : 'heavymusic',
|
||||
'huaban' : 'huaban',
|
||||
'huomao' : 'huomaotv',
|
||||
'iask' : 'sina',
|
||||
'icourses' : 'icourses',
|
||||
@ -64,6 +63,7 @@ SITES = {
|
||||
'iqiyi' : 'iqiyi',
|
||||
'ixigua' : 'ixigua',
|
||||
'isuntv' : 'suntv',
|
||||
'iwara' : 'iwara',
|
||||
'joy' : 'joy',
|
||||
'kankanews' : 'bilibili',
|
||||
'khanacademy' : 'khan',
|
||||
@ -74,6 +74,7 @@ SITES = {
|
||||
'le' : 'le',
|
||||
'letv' : 'le',
|
||||
'lizhi' : 'lizhi',
|
||||
'longzhu' : 'longzhu',
|
||||
'magisto' : 'magisto',
|
||||
'metacafe' : 'metacafe',
|
||||
'mgtv' : 'mgtv',
|
||||
@ -81,16 +82,15 @@ SITES = {
|
||||
'mixcloud' : 'mixcloud',
|
||||
'mtv81' : 'mtv81',
|
||||
'musicplayon' : 'musicplayon',
|
||||
'miaopai' : 'yixia',
|
||||
'naver' : 'naver',
|
||||
'7gogo' : 'nanagogo',
|
||||
'nicovideo' : 'nicovideo',
|
||||
'panda' : 'panda',
|
||||
'pinterest' : 'pinterest',
|
||||
'pixnet' : 'pixnet',
|
||||
'pptv' : 'pptv',
|
||||
'qingting' : 'qingting',
|
||||
'qq' : 'qq',
|
||||
'quanmin' : 'quanmin',
|
||||
'showroom-live' : 'showroom',
|
||||
'sina' : 'sina',
|
||||
'smgbb' : 'bilibili',
|
||||
@ -98,6 +98,7 @@ SITES = {
|
||||
'soundcloud' : 'soundcloud',
|
||||
'ted' : 'ted',
|
||||
'theplatform' : 'theplatform',
|
||||
'tiktok' : 'tiktok',
|
||||
'tucao' : 'tucao',
|
||||
'tudou' : 'tudou',
|
||||
'tumblr' : 'tumblr',
|
||||
@ -117,30 +118,32 @@ SITES = {
|
||||
'xiaojiadianvideo' : 'fc2video',
|
||||
'ximalaya' : 'ximalaya',
|
||||
'yinyuetai' : 'yinyuetai',
|
||||
'miaopai' : 'yixia',
|
||||
'yizhibo' : 'yizhibo',
|
||||
'youku' : 'youku',
|
||||
'iwara' : 'iwara',
|
||||
'youtu' : 'youtube',
|
||||
'youtube' : 'youtube',
|
||||
'zhanqi' : 'zhanqi',
|
||||
'365yg' : 'toutiao',
|
||||
'zhibo' : 'zhibo',
|
||||
'zhihu' : 'zhihu',
|
||||
}
|
||||
|
||||
dry_run = False
|
||||
json_output = False
|
||||
force = False
|
||||
skip_existing_file_size_check = False
|
||||
player = None
|
||||
extractor_proxy = None
|
||||
cookies = None
|
||||
output_filename = None
|
||||
auto_rename = False
|
||||
insecure = False
|
||||
|
||||
fake_headers = {
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # noqa
|
||||
'Accept-Charset': 'UTF-8,*;q=0.5',
|
||||
'Accept-Encoding': 'gzip,deflate,sdch',
|
||||
'Accept-Language': 'en-US,en;q=0.8',
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0', # noqa
|
||||
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0', # noqa
|
||||
}
|
||||
|
||||
if sys.stdout.isatty():
|
||||
@ -268,7 +271,15 @@ def matchall(text, patterns):
|
||||
def launch_player(player, urls):
|
||||
import subprocess
|
||||
import shlex
|
||||
subprocess.call(shlex.split(player) + list(urls))
|
||||
if (sys.version_info >= (3, 3)):
|
||||
import shutil
|
||||
exefile=shlex.split(player)[0]
|
||||
if shutil.which(exefile) is not None:
|
||||
subprocess.call(shlex.split(player) + list(urls))
|
||||
else:
|
||||
log.wtf('[Failed] Cannot find player "%s"' % exefile)
|
||||
else:
|
||||
subprocess.call(shlex.split(player) + list(urls))
|
||||
|
||||
|
||||
def parse_query_param(url, param):
|
||||
@ -366,20 +377,30 @@ def get_decoded_html(url, faker=False):
|
||||
return data
|
||||
|
||||
|
||||
def get_location(url):
|
||||
def get_location(url, headers=None, get_method='HEAD'):
|
||||
logging.debug('get_location: %s' % url)
|
||||
|
||||
response = request.urlopen(url)
|
||||
# urllib will follow redirections and it's too much code to tell urllib
|
||||
# not to do that
|
||||
return response.geturl()
|
||||
if headers:
|
||||
req = request.Request(url, headers=headers)
|
||||
else:
|
||||
req = request.Request(url)
|
||||
req.get_method = lambda: get_method
|
||||
res = urlopen_with_retry(req)
|
||||
return res.geturl()
|
||||
|
||||
|
||||
def urlopen_with_retry(*args, **kwargs):
|
||||
retry_time = 3
|
||||
for i in range(retry_time):
|
||||
try:
|
||||
return request.urlopen(*args, **kwargs)
|
||||
if insecure:
|
||||
# ignore ssl errors
|
||||
ctx = ssl.create_default_context()
|
||||
ctx.check_hostname = False
|
||||
ctx.verify_mode = ssl.CERT_NONE
|
||||
return request.urlopen(*args, context=ctx, **kwargs)
|
||||
else:
|
||||
return request.urlopen(*args, **kwargs)
|
||||
except socket.timeout as e:
|
||||
logging.debug('request attempt %s timeout' % str(i + 1))
|
||||
if i + 1 == retry_time:
|
||||
@ -423,17 +444,17 @@ def get_content(url, headers={}, decoded=True):
|
||||
# Decode the response body
|
||||
if decoded:
|
||||
charset = match1(
|
||||
response.getheader('Content-Type'), r'charset=([\w-]+)'
|
||||
response.getheader('Content-Type', ''), r'charset=([\w-]+)'
|
||||
)
|
||||
if charset is not None:
|
||||
data = data.decode(charset)
|
||||
data = data.decode(charset, 'ignore')
|
||||
else:
|
||||
data = data.decode('utf-8', 'ignore')
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def post_content(url, headers={}, post_data={}, decoded=True):
|
||||
def post_content(url, headers={}, post_data={}, decoded=True, **kwargs):
|
||||
"""Post the content of a URL via sending a HTTP POST request.
|
||||
|
||||
Args:
|
||||
@ -444,14 +465,19 @@ def post_content(url, headers={}, post_data={}, decoded=True):
|
||||
Returns:
|
||||
The content as a string.
|
||||
"""
|
||||
|
||||
logging.debug('post_content: %s \n post_data: %s' % (url, post_data))
|
||||
if kwargs.get('post_data_raw'):
|
||||
logging.debug('post_content: %s\npost_data_raw: %s' % (url, kwargs['post_data_raw']))
|
||||
else:
|
||||
logging.debug('post_content: %s\npost_data: %s' % (url, post_data))
|
||||
|
||||
req = request.Request(url, headers=headers)
|
||||
if cookies:
|
||||
cookies.add_cookie_header(req)
|
||||
req.headers.update(req.unredirected_hdrs)
|
||||
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
|
||||
if kwargs.get('post_data_raw'):
|
||||
post_data_enc = bytes(kwargs['post_data_raw'], 'utf-8')
|
||||
else:
|
||||
post_data_enc = bytes(parse.urlencode(post_data), 'utf-8')
|
||||
response = urlopen_with_retry(req, data=post_data_enc)
|
||||
data = response.read()
|
||||
|
||||
@ -493,7 +519,7 @@ def urls_size(urls, faker=False, headers={}):
|
||||
return sum([url_size(url, faker=faker, headers=headers) for url in urls])
|
||||
|
||||
|
||||
def get_head(url, headers={}, get_method='HEAD'):
|
||||
def get_head(url, headers=None, get_method='HEAD'):
|
||||
logging.debug('get_head: %s' % url)
|
||||
|
||||
if headers:
|
||||
@ -502,7 +528,7 @@ def get_head(url, headers={}, get_method='HEAD'):
|
||||
req = request.Request(url)
|
||||
req.get_method = lambda: get_method
|
||||
res = urlopen_with_retry(req)
|
||||
return dict(res.headers)
|
||||
return res.headers
|
||||
|
||||
|
||||
def url_info(url, faker=False, headers={}):
|
||||
@ -596,29 +622,60 @@ def url_save(
|
||||
# the key must be 'Referer' for the hack here
|
||||
if refer is not None:
|
||||
tmp_headers['Referer'] = refer
|
||||
file_size = url_size(url, faker=faker, headers=tmp_headers)
|
||||
if type(url) is list:
|
||||
file_size = urls_size(url, faker=faker, headers=tmp_headers)
|
||||
is_chunked, urls = True, url
|
||||
else:
|
||||
file_size = url_size(url, faker=faker, headers=tmp_headers)
|
||||
is_chunked, urls = False, [url]
|
||||
|
||||
if os.path.exists(filepath):
|
||||
if not force and file_size == os.path.getsize(filepath):
|
||||
if not is_part:
|
||||
if bar:
|
||||
bar.done()
|
||||
print(
|
||||
'Skipping {}: file already exists'.format(
|
||||
tr(os.path.basename(filepath))
|
||||
)
|
||||
)
|
||||
continue_renameing = True
|
||||
while continue_renameing:
|
||||
continue_renameing = False
|
||||
if os.path.exists(filepath):
|
||||
if not force and (file_size == os.path.getsize(filepath) or skip_existing_file_size_check):
|
||||
if not is_part:
|
||||
if bar:
|
||||
bar.done()
|
||||
if skip_existing_file_size_check:
|
||||
log.w(
|
||||
'Skipping {} without checking size: file already exists'.format(
|
||||
tr(os.path.basename(filepath))
|
||||
)
|
||||
)
|
||||
else:
|
||||
log.w(
|
||||
'Skipping {}: file already exists'.format(
|
||||
tr(os.path.basename(filepath))
|
||||
)
|
||||
)
|
||||
else:
|
||||
if bar:
|
||||
bar.update_received(file_size)
|
||||
return
|
||||
else:
|
||||
if bar:
|
||||
bar.update_received(file_size)
|
||||
return
|
||||
else:
|
||||
if not is_part:
|
||||
if bar:
|
||||
bar.done()
|
||||
print('Overwriting %s' % tr(os.path.basename(filepath)), '...')
|
||||
elif not os.path.exists(os.path.dirname(filepath)):
|
||||
os.mkdir(os.path.dirname(filepath))
|
||||
if not is_part:
|
||||
if bar:
|
||||
bar.done()
|
||||
if not force and auto_rename:
|
||||
path, ext = os.path.basename(filepath).rsplit('.', 1)
|
||||
finder = re.compile(' \([1-9]\d*?\)$')
|
||||
if (finder.search(path) is None):
|
||||
thisfile = path + ' (1).' + ext
|
||||
else:
|
||||
def numreturn(a):
|
||||
return ' (' + str(int(a.group()[2:-1]) + 1) + ').'
|
||||
thisfile = finder.sub(numreturn, path) + ext
|
||||
filepath = os.path.join(os.path.dirname(filepath), thisfile)
|
||||
print('Changing name to %s' % tr(os.path.basename(filepath)), '...')
|
||||
continue_renameing = True
|
||||
continue
|
||||
if log.yes_or_no('File with this name already exists. Overwrite?'):
|
||||
log.w('Overwriting %s ...' % tr(os.path.basename(filepath)))
|
||||
else:
|
||||
return
|
||||
elif not os.path.exists(os.path.dirname(filepath)):
|
||||
os.mkdir(os.path.dirname(filepath))
|
||||
|
||||
temp_filepath = filepath + '.download' if file_size != float('inf') \
|
||||
else filepath
|
||||
@ -633,70 +690,78 @@ def url_save(
|
||||
else:
|
||||
open_mode = 'wb'
|
||||
|
||||
if received < file_size:
|
||||
if faker:
|
||||
tmp_headers = fake_headers
|
||||
'''
|
||||
if parameter headers passed in, we have it copied as tmp_header
|
||||
elif headers:
|
||||
headers = headers
|
||||
else:
|
||||
headers = {}
|
||||
'''
|
||||
if received:
|
||||
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
|
||||
if refer:
|
||||
tmp_headers['Referer'] = refer
|
||||
for url in urls:
|
||||
received_chunk = 0
|
||||
if received < file_size:
|
||||
if faker:
|
||||
tmp_headers = fake_headers
|
||||
'''
|
||||
if parameter headers passed in, we have it copied as tmp_header
|
||||
elif headers:
|
||||
headers = headers
|
||||
else:
|
||||
headers = {}
|
||||
'''
|
||||
if received and not is_chunked: # only request a range when not chunked
|
||||
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
|
||||
if refer:
|
||||
tmp_headers['Referer'] = refer
|
||||
|
||||
if timeout:
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers), timeout=timeout
|
||||
)
|
||||
else:
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers)
|
||||
)
|
||||
try:
|
||||
range_start = int(
|
||||
response.headers[
|
||||
'content-range'
|
||||
][6:].split('/')[0].split('-')[0]
|
||||
)
|
||||
end_length = int(
|
||||
response.headers['content-range'][6:].split('/')[1]
|
||||
)
|
||||
range_length = end_length - range_start
|
||||
except:
|
||||
content_length = response.headers['content-length']
|
||||
range_length = int(content_length) if content_length is not None \
|
||||
else float('inf')
|
||||
if timeout:
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers), timeout=timeout
|
||||
)
|
||||
else:
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers)
|
||||
)
|
||||
try:
|
||||
range_start = int(
|
||||
response.headers[
|
||||
'content-range'
|
||||
][6:].split('/')[0].split('-')[0]
|
||||
)
|
||||
end_length = int(
|
||||
response.headers['content-range'][6:].split('/')[1]
|
||||
)
|
||||
range_length = end_length - range_start
|
||||
except:
|
||||
content_length = response.headers['content-length']
|
||||
range_length = int(content_length) if content_length is not None \
|
||||
else float('inf')
|
||||
|
||||
if file_size != received + range_length:
|
||||
received = 0
|
||||
if bar:
|
||||
bar.received = 0
|
||||
open_mode = 'wb'
|
||||
|
||||
with open(temp_filepath, open_mode) as output:
|
||||
while True:
|
||||
buffer = None
|
||||
try:
|
||||
buffer = response.read(1024 * 256)
|
||||
except socket.timeout:
|
||||
pass
|
||||
if not buffer:
|
||||
if received == file_size: # Download finished
|
||||
break
|
||||
# Unexpected termination. Retry request
|
||||
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers)
|
||||
)
|
||||
continue
|
||||
output.write(buffer)
|
||||
received += len(buffer)
|
||||
if is_chunked: # always append if chunked
|
||||
open_mode = 'ab'
|
||||
elif file_size != received + range_length: # is it ever necessary?
|
||||
received = 0
|
||||
if bar:
|
||||
bar.update_received(len(buffer))
|
||||
bar.received = 0
|
||||
open_mode = 'wb'
|
||||
|
||||
with open(temp_filepath, open_mode) as output:
|
||||
while True:
|
||||
buffer = None
|
||||
try:
|
||||
buffer = response.read(1024 * 256)
|
||||
except socket.timeout:
|
||||
pass
|
||||
if not buffer:
|
||||
if is_chunked and received_chunk == range_length:
|
||||
break
|
||||
elif not is_chunked and received == file_size: # Download finished
|
||||
break
|
||||
# Unexpected termination. Retry request
|
||||
if not is_chunked: # when
|
||||
tmp_headers['Range'] = 'bytes=' + str(received) + '-'
|
||||
response = urlopen_with_retry(
|
||||
request.Request(url, headers=tmp_headers)
|
||||
)
|
||||
continue
|
||||
output.write(buffer)
|
||||
received += len(buffer)
|
||||
received_chunk += len(buffer)
|
||||
if bar:
|
||||
bar.update_received(len(buffer))
|
||||
|
||||
assert received == os.path.getsize(temp_filepath), '%s == %s == %s' % (
|
||||
received, os.path.getsize(temp_filepath), temp_filepath
|
||||
@ -820,13 +885,16 @@ class DummyProgressBar:
|
||||
pass
|
||||
|
||||
|
||||
def get_output_filename(urls, title, ext, output_dir, merge):
|
||||
def get_output_filename(urls, title, ext, output_dir, merge, **kwargs):
|
||||
# lame hack for the --output-filename option
|
||||
global output_filename
|
||||
if output_filename:
|
||||
result = output_filename
|
||||
if kwargs.get('part', -1) >= 0:
|
||||
result = '%s[%02d]' % (result, kwargs.get('part'))
|
||||
if ext:
|
||||
return output_filename + '.' + ext
|
||||
return output_filename
|
||||
result = '%s.%s' % (result, ext)
|
||||
return result
|
||||
|
||||
merged_ext = ext
|
||||
if (len(urls) > 1) and merge:
|
||||
@ -843,7 +911,11 @@ def get_output_filename(urls, title, ext, output_dir, merge):
|
||||
merged_ext = 'mkv'
|
||||
else:
|
||||
merged_ext = 'ts'
|
||||
return '%s.%s' % (title, merged_ext)
|
||||
result = title
|
||||
if kwargs.get('part', -1) >= 0:
|
||||
result = '%s[%02d]' % (result, kwargs.get('part'))
|
||||
result = '%s.%s' % (result, merged_ext)
|
||||
return result
|
||||
|
||||
def print_user_agent(faker=False):
|
||||
urllib_default_user_agent = 'Python-urllib/%d.%d' % sys.version_info[:2]
|
||||
@ -863,7 +935,10 @@ def download_urls(
|
||||
return
|
||||
if dry_run:
|
||||
print_user_agent(faker=faker)
|
||||
print('Real URLs:\n%s' % '\n'.join(urls))
|
||||
try:
|
||||
print('Real URLs:\n%s' % '\n'.join(urls))
|
||||
except:
|
||||
print('Real URLs:\n%s' % '\n'.join([j for i in urls for j in i]))
|
||||
return
|
||||
|
||||
if player:
|
||||
@ -883,9 +958,13 @@ def download_urls(
|
||||
output_filepath = os.path.join(output_dir, output_filename)
|
||||
|
||||
if total_size:
|
||||
if not force and os.path.exists(output_filepath) \
|
||||
and os.path.getsize(output_filepath) >= total_size * 0.9:
|
||||
print('Skipping %s: file already exists' % output_filepath)
|
||||
if not force and os.path.exists(output_filepath) and not auto_rename\
|
||||
and (os.path.getsize(output_filepath) >= total_size * 0.9\
|
||||
or skip_existing_file_size_check):
|
||||
if skip_existing_file_size_check:
|
||||
log.w('Skipping %s without checking size: file already exists' % output_filepath)
|
||||
else:
|
||||
log.w('Skipping %s: file already exists' % output_filepath)
|
||||
print()
|
||||
return
|
||||
bar = SimpleProgressBar(total_size, len(urls))
|
||||
@ -903,16 +982,16 @@ def download_urls(
|
||||
bar.done()
|
||||
else:
|
||||
parts = []
|
||||
print('Downloading %s.%s ...' % (tr(title), ext))
|
||||
print('Downloading %s ...' % tr(output_filename))
|
||||
bar.update()
|
||||
for i, url in enumerate(urls):
|
||||
filename = '%s[%02d].%s' % (title, i, ext)
|
||||
filepath = os.path.join(output_dir, filename)
|
||||
parts.append(filepath)
|
||||
output_filename_i = get_output_filename(urls, title, ext, output_dir, merge, part=i)
|
||||
output_filepath_i = os.path.join(output_dir, output_filename_i)
|
||||
parts.append(output_filepath_i)
|
||||
# print 'Downloading %s [%s/%s]...' % (tr(filename), i + 1, len(urls))
|
||||
bar.update_piece(i + 1)
|
||||
url_save(
|
||||
url, filepath, bar, refer=refer, is_part=True, faker=faker,
|
||||
url, output_filepath_i, bar, refer=refer, is_part=True, faker=faker,
|
||||
headers=headers, **kwargs
|
||||
)
|
||||
bar.done()
|
||||
@ -1225,27 +1304,89 @@ def download_main(download, download_playlist, urls, playlist, **kwargs):
|
||||
|
||||
def load_cookies(cookiefile):
|
||||
global cookies
|
||||
try:
|
||||
cookies = cookiejar.MozillaCookieJar(cookiefile)
|
||||
cookies.load()
|
||||
except Exception:
|
||||
import sqlite3
|
||||
if cookiefile.endswith('.txt'):
|
||||
# MozillaCookieJar treats prefix '#HttpOnly_' as comments incorrectly!
|
||||
# do not use its load()
|
||||
# see also:
|
||||
# - https://docs.python.org/3/library/http.cookiejar.html#http.cookiejar.MozillaCookieJar
|
||||
# - https://github.com/python/cpython/blob/4b219ce/Lib/http/cookiejar.py#L2014
|
||||
# - https://curl.haxx.se/libcurl/c/CURLOPT_COOKIELIST.html#EXAMPLE
|
||||
#cookies = cookiejar.MozillaCookieJar(cookiefile)
|
||||
#cookies.load()
|
||||
from http.cookiejar import Cookie
|
||||
cookies = cookiejar.MozillaCookieJar()
|
||||
con = sqlite3.connect(cookiefile)
|
||||
cur = con.cursor()
|
||||
try:
|
||||
cur.execute("""SELECT host, path, isSecure, expiry, name, value
|
||||
FROM moz_cookies""")
|
||||
for item in cur.fetchall():
|
||||
c = cookiejar.Cookie(
|
||||
0, item[4], item[5], None, False, item[0],
|
||||
item[0].startswith('.'), item[0].startswith('.'),
|
||||
item[1], False, item[2], item[3], item[3] == '', None,
|
||||
None, {},
|
||||
)
|
||||
now = time.time()
|
||||
ignore_discard, ignore_expires = False, False
|
||||
with open(cookiefile, 'r') as f:
|
||||
for line in f:
|
||||
# last field may be absent, so keep any trailing tab
|
||||
if line.endswith("\n"): line = line[:-1]
|
||||
|
||||
# skip comments and blank lines XXX what is $ for?
|
||||
if (line.strip().startswith(("#", "$")) or
|
||||
line.strip() == ""):
|
||||
if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_
|
||||
continue
|
||||
|
||||
domain, domain_specified, path, secure, expires, name, value = \
|
||||
line.split("\t")
|
||||
secure = (secure == "TRUE")
|
||||
domain_specified = (domain_specified == "TRUE")
|
||||
if name == "":
|
||||
# cookies.txt regards 'Set-Cookie: foo' as a cookie
|
||||
# with no name, whereas http.cookiejar regards it as a
|
||||
# cookie with no value.
|
||||
name = value
|
||||
value = None
|
||||
|
||||
initial_dot = domain.startswith(".")
|
||||
if not line.strip().startswith('#HttpOnly_'): # skip for #HttpOnly_
|
||||
assert domain_specified == initial_dot
|
||||
|
||||
discard = False
|
||||
if expires == "":
|
||||
expires = None
|
||||
discard = True
|
||||
|
||||
# assume path_specified is false
|
||||
c = Cookie(0, name, value,
|
||||
None, False,
|
||||
domain, domain_specified, initial_dot,
|
||||
path, False,
|
||||
secure,
|
||||
expires,
|
||||
discard,
|
||||
None,
|
||||
None,
|
||||
{})
|
||||
if not ignore_discard and c.discard:
|
||||
continue
|
||||
if not ignore_expires and c.is_expired(now):
|
||||
continue
|
||||
cookies.set_cookie(c)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
elif cookiefile.endswith(('.sqlite', '.sqlite3')):
|
||||
import sqlite3, shutil, tempfile
|
||||
temp_dir = tempfile.gettempdir()
|
||||
temp_cookiefile = os.path.join(temp_dir, 'temp_cookiefile.sqlite')
|
||||
shutil.copy2(cookiefile, temp_cookiefile)
|
||||
|
||||
cookies = cookiejar.MozillaCookieJar()
|
||||
con = sqlite3.connect(temp_cookiefile)
|
||||
cur = con.cursor()
|
||||
cur.execute("""SELECT host, path, isSecure, expiry, name, value
|
||||
FROM moz_cookies""")
|
||||
for item in cur.fetchall():
|
||||
c = cookiejar.Cookie(
|
||||
0, item[4], item[5], None, False, item[0],
|
||||
item[0].startswith('.'), item[0].startswith('.'),
|
||||
item[1], False, item[2], item[3], item[3] == '', None,
|
||||
None, {},
|
||||
)
|
||||
cookies.set_cookie(c)
|
||||
|
||||
else:
|
||||
log.e('[error] unsupported cookies format')
|
||||
# TODO: Chromium Cookies
|
||||
# SELECT host_key, path, secure, expires_utc, name, encrypted_value
|
||||
# FROM cookies
|
||||
@ -1332,6 +1473,10 @@ def script_main(download, download_playlist, **kwargs):
|
||||
'-f', '--force', action='store_true', default=False,
|
||||
help='Force overwriting existing files'
|
||||
)
|
||||
download_grp.add_argument(
|
||||
'--skip-existing-file-size-check', action='store_true', default=False,
|
||||
help='Skip existing file without checking file size'
|
||||
)
|
||||
download_grp.add_argument(
|
||||
'-F', '--format', metavar='STREAM_ID',
|
||||
help='Set video format to STREAM_ID'
|
||||
@ -1370,6 +1515,15 @@ def script_main(download, download_playlist, **kwargs):
|
||||
'-l', '--playlist', action='store_true',
|
||||
help='Prefer to download a playlist'
|
||||
)
|
||||
download_grp.add_argument(
|
||||
'-a', '--auto-rename', action='store_true', default=False,
|
||||
help='Auto rename same name different files'
|
||||
)
|
||||
|
||||
download_grp.add_argument(
|
||||
'-k', '--insecure', action='store_true', default=False,
|
||||
help='ignore ssl errors'
|
||||
)
|
||||
|
||||
proxy_grp = parser.add_argument_group('Proxy options')
|
||||
proxy_grp = proxy_grp.add_mutually_exclusive_group()
|
||||
@ -1409,16 +1563,24 @@ def script_main(download, download_playlist, **kwargs):
|
||||
logging.getLogger().setLevel(logging.DEBUG)
|
||||
|
||||
global force
|
||||
global skip_existing_file_size_check
|
||||
global dry_run
|
||||
global json_output
|
||||
global player
|
||||
global extractor_proxy
|
||||
global output_filename
|
||||
|
||||
global auto_rename
|
||||
global insecure
|
||||
output_filename = args.output_filename
|
||||
extractor_proxy = args.extractor_proxy
|
||||
|
||||
info_only = args.info
|
||||
if args.force:
|
||||
force = True
|
||||
if args.skip_existing_file_size_check:
|
||||
skip_existing_file_size_check = True
|
||||
if args.auto_rename:
|
||||
auto_rename = True
|
||||
if args.url:
|
||||
dry_run = True
|
||||
if args.json:
|
||||
@ -1438,6 +1600,11 @@ def script_main(download, download_playlist, **kwargs):
|
||||
player = args.player
|
||||
caption = False
|
||||
|
||||
if args.insecure:
|
||||
# ignore ssl
|
||||
insecure = True
|
||||
|
||||
|
||||
if args.no_proxy:
|
||||
set_http_proxy('')
|
||||
else:
|
||||
@ -1523,9 +1690,9 @@ def google_search(url):
|
||||
url = 'https://www.google.com/search?tbm=vid&q=%s' % parse.quote(keywords)
|
||||
page = get_content(url, headers=fake_headers)
|
||||
videos = re.findall(
|
||||
r'<a href="(https?://[^"]+)" onmousedown="[^"]+">([^<]+)<', page
|
||||
r'<a href="(https?://[^"]+)" onmousedown="[^"]+"><h3 class="[^"]*">([^<]+)<', page
|
||||
)
|
||||
vdurs = re.findall(r'<span class="vdur _dwc">([^<]+)<', page)
|
||||
vdurs = re.findall(r'<span class="vdur[^"]*">([^<]+)<', page)
|
||||
durs = [r1(r'(\d+:\d+)', unescape_html(dur)) for dur in vdurs]
|
||||
print('Google Videos search:')
|
||||
for v in zip(videos, durs):
|
||||
@ -1554,6 +1721,11 @@ def url_to_module(url):
|
||||
domain = r1(r'(\.[^.]+\.[^.]+)$', video_host) or video_host
|
||||
assert domain, 'unsupported url: ' + url
|
||||
|
||||
# all non-ASCII code points must be quoted (percent-encoded UTF-8)
|
||||
url = ''.join([ch if ord(ch) in range(128) else parse.quote(ch) for ch in url])
|
||||
video_host = r1(r'https?://([^/]+)/', url)
|
||||
video_url = r1(r'https?://[^/]+(.*)', url)
|
||||
|
||||
k = r1(r'([^.]+)', domain)
|
||||
if k in SITES:
|
||||
return (
|
||||
@ -1561,15 +1733,11 @@ def url_to_module(url):
|
||||
url
|
||||
)
|
||||
else:
|
||||
import http.client
|
||||
video_host = r1(r'https?://([^/]+)/', url) # .cn could be removed
|
||||
if url.startswith('https://'):
|
||||
conn = http.client.HTTPSConnection(video_host)
|
||||
else:
|
||||
conn = http.client.HTTPConnection(video_host)
|
||||
conn.request('HEAD', video_url, headers=fake_headers)
|
||||
res = conn.getresponse()
|
||||
location = res.getheader('location')
|
||||
try:
|
||||
location = get_location(url) # t.co isn't happy with fake_headers
|
||||
except:
|
||||
location = get_location(url, headers=fake_headers)
|
||||
|
||||
if location and location != url and not location.startswith('/'):
|
||||
return url_to_module(location)
|
||||
else:
|
||||
|
@ -1,10 +1,11 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run
|
||||
from .common import match1, maybe_print, download_urls, get_filename, parse_host, set_proxy, unset_proxy, get_content, dry_run, player
|
||||
from .common import print_more_compatible as print
|
||||
from .util import log
|
||||
from . import json_output
|
||||
import os
|
||||
import sys
|
||||
|
||||
class Extractor():
|
||||
def __init__(self, *args):
|
||||
@ -32,7 +33,8 @@ class VideoExtractor():
|
||||
self.out = False
|
||||
self.ua = None
|
||||
self.referer = None
|
||||
self.danmuku = None
|
||||
self.danmaku = None
|
||||
self.lyrics = None
|
||||
|
||||
if args:
|
||||
self.url = args[0]
|
||||
@ -105,7 +107,7 @@ class VideoExtractor():
|
||||
if 'quality' in stream:
|
||||
print(" quality: %s" % stream['quality'])
|
||||
|
||||
if 'size' in stream and stream['container'].lower() != 'm3u8':
|
||||
if 'size' in stream and 'container' in stream and stream['container'].lower() != 'm3u8':
|
||||
if stream['size'] != float('inf') and stream['size'] != 0:
|
||||
print(" size: %s MiB (%s bytes)" % (round(stream['size'] / 1048576, 1), stream['size']))
|
||||
|
||||
@ -130,6 +132,8 @@ class VideoExtractor():
|
||||
print(" url: %s" % self.url)
|
||||
print()
|
||||
|
||||
sys.stdout.flush()
|
||||
|
||||
def p(self, stream_id=None):
|
||||
maybe_print("site: %s" % self.__class__.name)
|
||||
maybe_print("title: %s" % self.title)
|
||||
@ -154,9 +158,10 @@ class VideoExtractor():
|
||||
for stream in itags:
|
||||
self.p_stream(stream)
|
||||
# Print all other available streams
|
||||
print(" [ DEFAULT ] %s" % ('_' * 33))
|
||||
for stream in self.streams_sorted:
|
||||
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
|
||||
if self.streams_sorted:
|
||||
print(" [ DEFAULT ] %s" % ('_' * 33))
|
||||
for stream in self.streams_sorted:
|
||||
self.p_stream(stream['id'] if 'id' in stream else stream['itag'])
|
||||
|
||||
if self.audiolang:
|
||||
print("audio-languages:")
|
||||
@ -164,6 +169,8 @@ class VideoExtractor():
|
||||
print(" - lang: {}".format(i['lang']))
|
||||
print(" download-url: {}\n".format(i['url']))
|
||||
|
||||
sys.stdout.flush()
|
||||
|
||||
def p_playlist(self, stream_id=None):
|
||||
maybe_print("site: %s" % self.__class__.name)
|
||||
print("playlist: %s" % self.title)
|
||||
@ -195,7 +202,13 @@ class VideoExtractor():
|
||||
else:
|
||||
# Download stream with the best quality
|
||||
from .processor.ffmpeg import has_ffmpeg_installed
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
if has_ffmpeg_installed() and player is None and self.dash_streams or not self.streams_sorted:
|
||||
#stream_id = list(self.dash_streams)[-1]
|
||||
itags = sorted(self.dash_streams,
|
||||
key=lambda i: -self.dash_streams[i]['size'])
|
||||
stream_id = itags[0]
|
||||
else:
|
||||
stream_id = self.streams_sorted[0]['id'] if 'id' in self.streams_sorted[0] else self.streams_sorted[0]['itag']
|
||||
|
||||
if 'index' not in kwargs:
|
||||
self.p(stream_id)
|
||||
@ -211,7 +224,7 @@ class VideoExtractor():
|
||||
ext = self.dash_streams[stream_id]['container']
|
||||
total_size = self.dash_streams[stream_id]['size']
|
||||
|
||||
if ext == 'm3u8':
|
||||
if ext == 'm3u8' or ext == 'm4a':
|
||||
ext = 'mp4'
|
||||
|
||||
if not urls:
|
||||
@ -226,9 +239,11 @@ class VideoExtractor():
|
||||
output_dir=kwargs['output_dir'],
|
||||
merge=kwargs['merge'],
|
||||
av=stream_id in self.dash_streams)
|
||||
|
||||
if 'caption' not in kwargs or not kwargs['caption']:
|
||||
print('Skipping captions or danmuku.')
|
||||
print('Skipping captions or danmaku.')
|
||||
return
|
||||
|
||||
for lang in self.caption_tracks:
|
||||
filename = '%s.%s.srt' % (get_filename(self.title), lang)
|
||||
print('Saving %s ... ' % filename, end="", flush=True)
|
||||
@ -237,11 +252,18 @@ class VideoExtractor():
|
||||
'w', encoding='utf-8') as x:
|
||||
x.write(srt)
|
||||
print('Done.')
|
||||
if self.danmuku is not None and not dry_run:
|
||||
|
||||
if self.danmaku is not None and not dry_run:
|
||||
filename = '{}.cmt.xml'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.danmuku)
|
||||
fp.write(self.danmaku)
|
||||
|
||||
if self.lyrics is not None and not dry_run:
|
||||
filename = '{}.lrc'.format(get_filename(self.title))
|
||||
print('Downloading {} ...\n'.format(filename))
|
||||
with open(os.path.join(kwargs['output_dir'], filename), 'w', encoding='utf8') as fp:
|
||||
fp.write(self.lyrics)
|
||||
|
||||
# For main_dev()
|
||||
#download_urls(urls, self.title, self.streams[stream_id]['container'], self.streams[stream_id]['size'])
|
||||
|
@ -13,20 +13,17 @@ from .ckplayer import *
|
||||
from .cntv import *
|
||||
from .coub import *
|
||||
from .dailymotion import *
|
||||
from .dilidili import *
|
||||
from .douban import *
|
||||
from .douyin import *
|
||||
from .douyutv import *
|
||||
from .ehow import *
|
||||
from .facebook import *
|
||||
from .fantasy import *
|
||||
from .fc2video import *
|
||||
from .flickr import *
|
||||
from .freesound import *
|
||||
from .funshion import *
|
||||
from .google import *
|
||||
from .heavymusic import *
|
||||
from .huaban import *
|
||||
from .icourses import *
|
||||
from .ifeng import *
|
||||
from .imgur import *
|
||||
@ -41,6 +38,7 @@ from .kugou import *
|
||||
from .kuwo import *
|
||||
from .le import *
|
||||
from .lizhi import *
|
||||
from .longzhu import *
|
||||
from .magisto import *
|
||||
from .metacafe import *
|
||||
from .mgtv import *
|
||||
@ -53,7 +51,6 @@ from .nanagogo import *
|
||||
from .naver import *
|
||||
from .netease import *
|
||||
from .nicovideo import *
|
||||
from .panda import *
|
||||
from .pinterest import *
|
||||
from .pixnet import *
|
||||
from .pptv import *
|
||||
@ -66,6 +63,7 @@ from .sohu import *
|
||||
from .soundcloud import *
|
||||
from .suntv import *
|
||||
from .theplatform import *
|
||||
from .tiktok import *
|
||||
from .tucao import *
|
||||
from .tudou import *
|
||||
from .tumblr import *
|
||||
@ -87,3 +85,5 @@ from .ted import *
|
||||
from .khan import *
|
||||
from .zhanqi import *
|
||||
from .kuaishou import *
|
||||
from .zhibo import *
|
||||
from .zhihu import *
|
||||
|
@ -65,7 +65,7 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
|
||||
elif sourceType == 'tudou':
|
||||
tudou_download_by_iid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'qq':
|
||||
qq_download_by_vid(sourceId, title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
qq_download_by_vid(sourceId, title, True, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'letv':
|
||||
letvcloud_download_by_vu(sourceId, '2d8c027396', title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
elif sourceType == 'zhuzhan':
|
||||
@ -85,9 +85,13 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
|
||||
_, _, seg_size = url_info(url)
|
||||
size += seg_size
|
||||
#fallback to flvhd is not quite possible
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if re.search(r'fid=[0-9A-Z\-]*.flv', preferred[0][0]):
|
||||
ext = 'flv'
|
||||
else:
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(preferred[0], title, 'mp4', size, output_dir=output_dir, merge=merge)
|
||||
download_urls(preferred[0], title, ext, size, output_dir=output_dir, merge=merge)
|
||||
else:
|
||||
raise NotImplementedError(sourceType)
|
||||
|
||||
@ -105,27 +109,46 @@ def acfun_download_by_vid(vid, title, output_dir='.', merge=True, info_only=Fals
|
||||
pass
|
||||
|
||||
def acfun_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
assert re.match(r'http://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url)
|
||||
html = get_content(url)
|
||||
assert re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/(\D|bangumi)/\D\D(\d+)', url)
|
||||
|
||||
title = r1(r'data-title="([^"]+)"', html)
|
||||
if re.match(r'https?://[^\.]*\.*acfun\.[^\.]+/\D/\D\D(\d+)', url):
|
||||
html = get_content(url)
|
||||
json_text = match1(html, r"(?s)videoInfo\s*=\s*(\{.*?\});")
|
||||
json_data = json.loads(json_text)
|
||||
vid = json_data.get('currentVideoInfo').get('id')
|
||||
up = json_data.get('user').get('name')
|
||||
title = json_data.get('title')
|
||||
video_list = json_data.get('videoList')
|
||||
if len(video_list) > 1:
|
||||
title += " - " + [p.get('title') for p in video_list if p.get('id') == vid][0]
|
||||
# bangumi
|
||||
elif re.match("https?://[^\.]*\.*acfun\.[^\.]+/bangumi/ab(\d+)", url):
|
||||
html = get_content(url)
|
||||
tag_script = match1(html, r'<script>window\.pageInfo([^<]+)</script>')
|
||||
json_text = tag_script[tag_script.find('{') : tag_script.find('};') + 1]
|
||||
json_data = json.loads(json_text)
|
||||
title = json_data['bangumiTitle'] + " " + json_data['episodeName'] + " " + json_data['title']
|
||||
vid = str(json_data['videoId'])
|
||||
up = "acfun"
|
||||
else:
|
||||
raise NotImplemented
|
||||
|
||||
assert title and vid
|
||||
title = unescape_html(title)
|
||||
title = escape_file_path(title)
|
||||
assert title
|
||||
if match1(url, r'_(\d+)$'): # current P
|
||||
title = title + " " + r1(r'active">([^<]*)', html)
|
||||
|
||||
vid = r1('data-vid="(\d+)"', html)
|
||||
up = r1('data-name="([^"]+)"', html)
|
||||
p_title = r1('active">([^<]+)', html)
|
||||
title = '%s (%s)' % (title, up)
|
||||
if p_title: title = '%s - %s' % (title, p_title)
|
||||
if p_title:
|
||||
title = '%s - %s' % (title, p_title)
|
||||
|
||||
|
||||
acfun_download_by_vid(vid, title,
|
||||
output_dir=output_dir,
|
||||
merge=merge,
|
||||
info_only=info_only,
|
||||
**kwargs)
|
||||
|
||||
site_info = "AcFun.tv"
|
||||
|
||||
site_info = "AcFun.cn"
|
||||
download = acfun_download
|
||||
download_playlist = playlist_not_supported('acfun')
|
||||
|
@ -38,7 +38,7 @@ def baidu_get_song_title(data):
|
||||
|
||||
def baidu_get_song_lyric(data):
|
||||
lrc = data['lrcLink']
|
||||
return None if lrc is '' else "http://music.baidu.com%s" % lrc
|
||||
return "http://music.baidu.com%s" % lrc if lrc else None
|
||||
|
||||
|
||||
def baidu_download_song(sid, output_dir='.', merge=True, info_only=False):
|
||||
@ -123,12 +123,22 @@ def baidu_download(url, output_dir='.', stream_type=None, merge=True, info_only=
|
||||
elif re.match('http://tieba.baidu.com/', url):
|
||||
try:
|
||||
# embedded videos
|
||||
embed_download(url, output_dir, merge=merge, info_only=info_only)
|
||||
embed_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
except:
|
||||
# images
|
||||
html = get_html(url)
|
||||
title = r1(r'title:"([^"]+)"', html)
|
||||
|
||||
vhsrc = re.findall(r'"BDE_Image"[^>]+src="([^"]+\.mp4)"', html) or \
|
||||
re.findall(r'vhsrc="([^"]+)"', html)
|
||||
if len(vhsrc) > 0:
|
||||
ext = 'mp4'
|
||||
size = url_size(vhsrc[0])
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(vhsrc, title, ext, size,
|
||||
output_dir=output_dir, merge=False)
|
||||
|
||||
items = re.findall(
|
||||
r'//imgsrc.baidu.com/forum/w[^"]+/([^/"]+)', html)
|
||||
urls = ['http://imgsrc.baidu.com/forum/pic/item/' + i
|
||||
|
@ -1,362 +1,573 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['bilibili_download']
|
||||
from ..common import *
|
||||
from ..extractor import VideoExtractor
|
||||
|
||||
import hashlib
|
||||
import re
|
||||
import time
|
||||
import json
|
||||
import http.cookiejar
|
||||
import urllib.request
|
||||
import urllib.parse
|
||||
from xml.dom.minidom import parseString
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
from ..extractor import *
|
||||
|
||||
from .qq import qq_download_by_vid
|
||||
from .sina import sina_download_by_vid
|
||||
from .tudou import tudou_download_by_id
|
||||
from .youku import youku_download_by_vid
|
||||
|
||||
class Bilibili(VideoExtractor):
|
||||
name = 'Bilibili'
|
||||
live_api = 'http://live.bilibili.com/api/playurl?cid={}&otype=json'
|
||||
api_url = 'http://interface.bilibili.com/playurl?'
|
||||
bangumi_api_url = 'http://bangumi.bilibili.com/player/web_api/playurl?'
|
||||
live_room_init_api_url = 'https://api.live.bilibili.com/room/v1/Room/room_init?id={}'
|
||||
live_room_info_api_url = 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id={}'
|
||||
name = "Bilibili"
|
||||
|
||||
SEC1 = '1c15888dc316e05a15fdd0a02ed6584f'
|
||||
SEC2 = '9b288147e5474dd2aa67085f716c560d'
|
||||
# Bilibili media encoding options, in descending quality order.
|
||||
stream_types = [
|
||||
{'id': 'hdflv'},
|
||||
{'id': 'flv720'},
|
||||
{'id': 'flv'},
|
||||
{'id': 'hdmp4'},
|
||||
{'id': 'mp4'},
|
||||
{'id': 'live'},
|
||||
{'id': 'vc'}
|
||||
{'id': 'flv_p60', 'quality': 116, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P60'},
|
||||
{'id': 'hdflv2', 'quality': 112, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P+'},
|
||||
{'id': 'flv', 'quality': 80, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '1080p', 'desc': '高清 1080P'},
|
||||
{'id': 'flv720_p60', 'quality': 74, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P60'},
|
||||
{'id': 'flv720', 'quality': 64, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '720p', 'desc': '高清 720P'},
|
||||
{'id': 'hdmp4', 'quality': 48, 'audio_quality': 30280,
|
||||
'container': 'MP4', 'video_resolution': '720p', 'desc': '高清 720P (MP4)'},
|
||||
{'id': 'flv480', 'quality': 32, 'audio_quality': 30280,
|
||||
'container': 'FLV', 'video_resolution': '480p', 'desc': '清晰 480P'},
|
||||
{'id': 'flv360', 'quality': 16, 'audio_quality': 30216,
|
||||
'container': 'FLV', 'video_resolution': '360p', 'desc': '流畅 360P'},
|
||||
# 'quality': 15?
|
||||
{'id': 'mp4', 'quality': 0},
|
||||
]
|
||||
fmt2qlt = dict(hdflv=4, flv=3, hdmp4=2, mp4=1)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_stream_type(urls):
|
||||
url = urls[0]
|
||||
if 'hd.flv' in url or '-112.flv' in url:
|
||||
return 'hdflv', 'flv'
|
||||
if '-64.flv' in url:
|
||||
return 'flv720', 'flv'
|
||||
if '.flv' in url:
|
||||
return 'flv', 'flv'
|
||||
if 'hd.mp4' in url or '-48.mp4' in url:
|
||||
return 'hdmp4', 'mp4'
|
||||
if '.mp4' in url:
|
||||
return 'mp4', 'mp4'
|
||||
raise Exception('Unknown stream type')
|
||||
|
||||
def api_req(self, cid, quality, bangumi, bangumi_movie=False, **kwargs):
|
||||
ts = str(int(time.time()))
|
||||
if not bangumi:
|
||||
params_str = 'cid={}&player=1&quality={}&ts={}'.format(cid, quality, ts)
|
||||
chksum = hashlib.md5(bytes(params_str+self.SEC1, 'utf8')).hexdigest()
|
||||
api_url = self.api_url + params_str + '&sign=' + chksum
|
||||
def height_to_quality(height):
|
||||
if height <= 360:
|
||||
return 16
|
||||
elif height <= 480:
|
||||
return 32
|
||||
elif height <= 720:
|
||||
return 64
|
||||
else:
|
||||
mod = 'movie' if bangumi_movie else 'bangumi'
|
||||
params_str = 'cid={}&module={}&player=1&quality={}&ts={}'.format(cid, mod, quality, ts)
|
||||
chksum = hashlib.md5(bytes(params_str+self.SEC2, 'utf8')).hexdigest()
|
||||
api_url = self.bangumi_api_url + params_str + '&sign=' + chksum
|
||||
return 80
|
||||
|
||||
xml_str = get_content(api_url, headers={'referer': self.url, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'})
|
||||
return xml_str
|
||||
@staticmethod
|
||||
def bilibili_headers(referer=None, cookie=None):
|
||||
# a reasonable UA
|
||||
ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
|
||||
headers = {'User-Agent': ua}
|
||||
if referer is not None:
|
||||
headers.update({'Referer': referer})
|
||||
if cookie is not None:
|
||||
headers.update({'Cookie': cookie})
|
||||
return headers
|
||||
|
||||
def parse_bili_xml(self, xml_str):
|
||||
urls_list = []
|
||||
total_size = 0
|
||||
doc = parseString(xml_str.encode('utf8'))
|
||||
durls = doc.getElementsByTagName('durl')
|
||||
for durl in durls:
|
||||
size = durl.getElementsByTagName('size')[0]
|
||||
total_size += int(size.firstChild.nodeValue)
|
||||
url = durl.getElementsByTagName('url')[0]
|
||||
urls_list.append(url.firstChild.nodeValue)
|
||||
stream_type, container = self.bilibili_stream_type(urls_list)
|
||||
if stream_type not in self.streams:
|
||||
self.streams[stream_type] = {}
|
||||
self.streams[stream_type]['src'] = urls_list
|
||||
self.streams[stream_type]['size'] = total_size
|
||||
self.streams[stream_type]['container'] = container
|
||||
@staticmethod
|
||||
def bilibili_api(avid, cid, qn=0):
|
||||
return 'https://api.bilibili.com/x/player/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&fnver=0&fnval=16' % (avid, cid, qn)
|
||||
|
||||
def download_by_vid(self, cid, bangumi, **kwargs):
|
||||
stream_id = kwargs.get('stream_id')
|
||||
# guard here. if stream_id invalid, fallback as not stream_id
|
||||
if stream_id and stream_id in self.fmt2qlt:
|
||||
quality = stream_id
|
||||
else:
|
||||
quality = 'hdflv' if bangumi else 'flv'
|
||||
@staticmethod
|
||||
def bilibili_audio_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/url?sid=%s' % sid
|
||||
|
||||
info_only = kwargs.get('info_only')
|
||||
for qlt in range(4, -1, -1):
|
||||
api_xml = self.api_req(cid, qlt, bangumi, **kwargs)
|
||||
self.parse_bili_xml(api_xml)
|
||||
if not info_only or stream_id:
|
||||
self.danmuku = get_danmuku_xml(cid)
|
||||
@staticmethod
|
||||
def bilibili_audio_info_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/song/info?sid=%s' % sid
|
||||
|
||||
@staticmethod
|
||||
def bilibili_audio_menu_info_api(sid):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/menu/info?sid=%s' % sid
|
||||
|
||||
@staticmethod
|
||||
def bilibili_audio_menu_song_api(sid, ps=100):
|
||||
return 'https://www.bilibili.com/audio/music-service-c/web/song/of-menu?sid=%s&pn=1&ps=%s' % (sid, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_bangumi_api(avid, cid, ep_id, qn=0):
|
||||
return 'https://api.bilibili.com/pgc/player/web/playurl?avid=%s&cid=%s&qn=%s&type=&otype=json&ep_id=%s&fnver=0&fnval=16' % (avid, cid, qn, ep_id)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_interface_api(cid, qn=0):
|
||||
entropy = 'rbMCKn@KuamXWlPMoJGsKcbiJKUfkPF_8dABscJntvqhRSETg'
|
||||
appkey, sec = ''.join([chr(ord(i) + 2) for i in entropy[::-1]]).split(':')
|
||||
params = 'appkey=%s&cid=%s&otype=json&qn=%s&quality=%s&type=' % (appkey, cid, qn, qn)
|
||||
chksum = hashlib.md5(bytes(params + sec, 'utf8')).hexdigest()
|
||||
return 'https://interface.bilibili.com/v2/playurl?%s&sign=%s' % (params, chksum)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_live_api(cid):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/playUrl?cid=%s&quality=0&platform=web' % cid
|
||||
|
||||
@staticmethod
|
||||
def bilibili_live_room_info_api(room_id):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/get_info?room_id=%s' % room_id
|
||||
|
||||
@staticmethod
|
||||
def bilibili_live_room_init_api(room_id):
|
||||
return 'https://api.live.bilibili.com/room/v1/Room/room_init?id=%s' % room_id
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_channel_api(mid, cid, pn=1, ps=100):
|
||||
return 'https://api.bilibili.com/x/space/channel/video?mid=%s&cid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (mid, cid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_favlist_api(vmid, fid, pn=1, ps=100):
|
||||
return 'https://api.bilibili.com/x/space/fav/arc?vmid=%s&fid=%s&pn=%s&ps=%s&order=0&jsonp=jsonp' % (vmid, fid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_space_video_api(mid, pn=1, ps=100):
|
||||
return 'https://space.bilibili.com/ajax/member/getSubmitVideos?mid=%s&page=%s&pagesize=%s&order=0&jsonp=jsonp' % (mid, pn, ps)
|
||||
|
||||
@staticmethod
|
||||
def bilibili_vc_api(video_id):
|
||||
return 'https://api.vc.bilibili.com/clip/v1/video/detail?video_id=%s' % video_id
|
||||
|
||||
@staticmethod
|
||||
def url_size(url, faker=False, headers={},err_value=0):
|
||||
try:
|
||||
return url_size(url,faker,headers)
|
||||
except:
|
||||
return err_value
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
if socket.getdefaulttimeout() == 600: # no timeout specified
|
||||
socket.setdefaulttimeout(2) # fail fast, very speedy!
|
||||
|
||||
# handle "watchlater" URLs
|
||||
if '/watchlater/' in self.url:
|
||||
aid = re.search(r'av(\d+)', self.url).group(1)
|
||||
self.url = 'http://www.bilibili.com/video/av{}/'.format(aid)
|
||||
|
||||
self.ua = fake_headers['User-Agent']
|
||||
self.url = url_locations([self.url])[0]
|
||||
frag = urllib.parse.urlparse(self.url).fragment
|
||||
# http://www.bilibili.com/video/av3141144/index_2.html#page=3
|
||||
if frag:
|
||||
hit = re.search(r'page=(\d+)', frag)
|
||||
if hit is not None:
|
||||
page = hit.group(1)
|
||||
aid = re.search(r'av(\d+)', self.url).group(1)
|
||||
self.url = 'http://www.bilibili.com/video/av{}/index_{}.html'.format(aid, page)
|
||||
self.referer = self.url
|
||||
self.page = get_content(self.url)
|
||||
|
||||
m = re.search(r'<h1.*?>(.*?)</h1>', self.page) or re.search(r'<h1 title="([^"]+)">', self.page)
|
||||
if m is not None:
|
||||
self.title = m.group(1)
|
||||
if self.title is None:
|
||||
m = re.search(r'property="og:title" content="([^"]+)"', self.page)
|
||||
if m is not None:
|
||||
self.title = m.group(1)
|
||||
if 'subtitle' in kwargs:
|
||||
subtitle = kwargs['subtitle']
|
||||
self.title = '{} {}'.format(self.title, subtitle)
|
||||
|
||||
if 'bangumi.bilibili.com/movie' in self.url:
|
||||
self.movie_entry(**kwargs)
|
||||
elif 'bangumi.bilibili.com' in self.url:
|
||||
self.bangumi_entry(**kwargs)
|
||||
elif 'bangumi/' in self.url:
|
||||
self.bangumi_entry(**kwargs)
|
||||
elif 'live.bilibili.com' in self.url:
|
||||
self.live_entry(**kwargs)
|
||||
elif 'vc.bilibili.com' in self.url:
|
||||
self.vc_entry(**kwargs)
|
||||
else:
|
||||
self.entry(**kwargs)
|
||||
|
||||
def movie_entry(self, **kwargs):
|
||||
patt = r"var\s*aid\s*=\s*'(\d+)'"
|
||||
aid = re.search(patt, self.page).group(1)
|
||||
page_list = json.loads(get_content('http://www.bilibili.com/widget/getPageList?aid={}'.format(aid)))
|
||||
# better ideas for bangumi_movie titles?
|
||||
self.title = page_list[0]['pagename']
|
||||
self.download_by_vid(page_list[0]['cid'], True, bangumi_movie=True, **kwargs)
|
||||
|
||||
def entry(self, **kwargs):
|
||||
# tencent player
|
||||
tc_flashvars = re.search(r'"bili-cid=\d+&bili-aid=\d+&vid=([^"]+)"', self.page)
|
||||
if tc_flashvars:
|
||||
tc_flashvars = tc_flashvars.group(1)
|
||||
if tc_flashvars is not None:
|
||||
self.out = True
|
||||
qq_download_by_vid(tc_flashvars, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
|
||||
return
|
||||
|
||||
has_plist = re.search(r'<option', self.page)
|
||||
if has_plist and r1('index_(\d+).html', self.url) is None:
|
||||
log.w('This page contains a playlist. (use --playlist to download all videos.)')
|
||||
self.stream_qualities = {s['quality']: s for s in self.stream_types}
|
||||
|
||||
try:
|
||||
cid = re.search(r'cid=(\d+)', self.page).group(1)
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers())
|
||||
except:
|
||||
cid = re.search(r'"cid":(\d+)', self.page).group(1)
|
||||
if cid is not None:
|
||||
self.download_by_vid(cid, re.search('bangumi', self.url) is not None, **kwargs)
|
||||
html_content = '' # live always returns 400 (why?)
|
||||
#self.title = match1(html_content,
|
||||
# r'<h1 title="([^"]+)"')
|
||||
|
||||
# redirect: watchlater
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/watchlater/#/av(\d+)', self.url):
|
||||
avid = match1(self.url, r'/av(\d+)')
|
||||
p = int(match1(self.url, r'/p(\d+)') or '1')
|
||||
self.url = 'https://www.bilibili.com/video/av%s?p=%s' % (avid, p)
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers())
|
||||
|
||||
# redirect: bangumi/play/ss -> bangumi/play/ep
|
||||
# redirect: bangumi.bilibili.com/anime -> bangumi/play/ep
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ss(\d+)', self.url) or \
|
||||
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)/play', self.url):
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
ep_id = initial_state['epList'][0]['id']
|
||||
self.url = 'https://www.bilibili.com/bangumi/play/ep%s' % ep_id
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers())
|
||||
|
||||
# sort it out
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/audio/au(\d+)', self.url):
|
||||
sort = 'audio'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
|
||||
sort = 'bangumi'
|
||||
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
|
||||
sort = 'bangumi'
|
||||
elif re.match(r'https?://live\.bilibili\.com/', self.url):
|
||||
sort = 'live'
|
||||
elif re.match(r'https?://vc\.bilibili\.com/video/(\d+)', self.url):
|
||||
sort = 'vc'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/video/av(\d+)', self.url):
|
||||
sort = 'video'
|
||||
else:
|
||||
# flashvars?
|
||||
flashvars = re.search(r'flashvars="([^"]+)"', self.page).group(1)
|
||||
if flashvars is None:
|
||||
raise Exception('Unsupported page {}'.format(self.url))
|
||||
param = flashvars.split('&')[0]
|
||||
t, cid = param.split('=')
|
||||
t = t.strip()
|
||||
cid = cid.strip()
|
||||
if t == 'vid':
|
||||
sina_download_by_vid(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
|
||||
elif t == 'ykid':
|
||||
youku_download_by_vid(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
|
||||
elif t == 'uid':
|
||||
tudou_download_by_id(cid, self.title, output_dir=kwargs['output_dir'], merge=kwargs['merge'], info_only=kwargs['info_only'])
|
||||
else:
|
||||
raise NotImplementedError('Unknown flashvars {}'.format(flashvars))
|
||||
self.download_playlist_by_url(self.url, **kwargs)
|
||||
return
|
||||
|
||||
def live_entry(self, **kwargs):
|
||||
# Extract room ID from the short display ID (seen in the room
|
||||
# URL). The room ID is usually the same as the short ID, but not
|
||||
# always; case in point: https://live.bilibili.com/48, with 48
|
||||
# as the short ID and 63727 as the actual ID.
|
||||
room_short_id = re.search(r'live.bilibili.com/([^?]+)', self.url).group(1)
|
||||
room_init_api_response = json.loads(get_content(self.live_room_init_api_url.format(room_short_id)))
|
||||
self.room_id = room_init_api_response['data']['room_id']
|
||||
# regular av video
|
||||
if sort == 'video':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
|
||||
room_info_api_response = json.loads(get_content(self.live_room_info_api_url.format(self.room_id)))
|
||||
self.title = room_info_api_response['data']['title']
|
||||
playinfo_text = match1(html_content, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo = json.loads(playinfo_text) if playinfo_text else None
|
||||
|
||||
api_url = self.live_api.format(self.room_id)
|
||||
json_data = json.loads(get_content(api_url))
|
||||
urls = [json_data['durl'][0]['url']]
|
||||
html_content_ = get_content(self.url, headers=self.bilibili_headers(cookie='CURRENT_FNVAL=16'))
|
||||
playinfo_text_ = match1(html_content_, r'__playinfo__=(.*?)</script><script>') # FIXME
|
||||
playinfo_ = json.loads(playinfo_text_) if playinfo_text_ else None
|
||||
|
||||
self.streams['live'] = {}
|
||||
self.streams['live']['src'] = urls
|
||||
self.streams['live']['container'] = 'flv'
|
||||
self.streams['live']['size'] = 0
|
||||
# warn if it is a multi-part video
|
||||
pn = initial_state['videoData']['videos']
|
||||
if pn > 1 and not kwargs.get('playlist'):
|
||||
log.w('This is a multipart video. (use --playlist to download all parts.)')
|
||||
|
||||
def vc_entry(self, **kwargs):
|
||||
vc_id = re.search(r'video/(\d+)', self.url)
|
||||
if not vc_id:
|
||||
vc_id = re.search(r'vcdetail\?vc=(\d+)', self.url)
|
||||
if not vc_id:
|
||||
log.wtf('Unknown url pattern')
|
||||
endpoint = 'http://api.vc.bilibili.com/clip/v1/video/detail?video_id={}&need_playurl=1'.format(vc_id.group(1))
|
||||
vc_meta = json.loads(get_content(endpoint, headers=fake_headers))
|
||||
if vc_meta['code'] != 0:
|
||||
log.wtf('{}\n{}'.format(vc_meta['msg'], vc_meta['message']))
|
||||
item = vc_meta['data']['item']
|
||||
self.title = item['description']
|
||||
# set video title
|
||||
self.title = initial_state['videoData']['title']
|
||||
# refine title for a specific part, if it is a multi-part video
|
||||
p = int(match1(self.url, r'[\?&]p=(\d+)') or match1(self.url, r'/index_(\d+)') or
|
||||
'1') # use URL to decide p-number, not initial_state['p']
|
||||
if pn > 1:
|
||||
part = initial_state['videoData']['pages'][p - 1]['part']
|
||||
self.title = '%s (P%s. %s)' % (self.title, p, part)
|
||||
|
||||
self.streams['vc'] = {}
|
||||
self.streams['vc']['src'] = [item['video_playurl']]
|
||||
self.streams['vc']['container'] = 'mp4'
|
||||
self.streams['vc']['size'] = int(item['video_size'])
|
||||
# construct playinfos
|
||||
avid = initial_state['aid']
|
||||
cid = initial_state['videoData']['pages'][p - 1]['cid'] # use p-number, not initial_state['videoData']['cid']
|
||||
current_quality, best_quality = None, None
|
||||
if playinfo is not None:
|
||||
current_quality = playinfo['data']['quality'] or None # 0 indicates an error, fallback to None
|
||||
if 'accept_quality' in playinfo['data'] and playinfo['data']['accept_quality'] != []:
|
||||
best_quality = playinfo['data']['accept_quality'][0]
|
||||
playinfos = []
|
||||
if playinfo is not None:
|
||||
playinfos.append(playinfo)
|
||||
if playinfo_ is not None:
|
||||
playinfos.append(playinfo_)
|
||||
# get alternative formats from API
|
||||
for qn in [80, 64, 32, 16]:
|
||||
# automatic format for durl: qn=0
|
||||
# for dash, qn does not matter
|
||||
if current_quality is None or qn < current_quality:
|
||||
api_url = self.bilibili_api(avid, cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
else:
|
||||
message = api_playinfo['data']['message']
|
||||
if best_quality is None or qn <= best_quality:
|
||||
api_url = self.bilibili_interface_api(cid, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo_data = json.loads(api_content)
|
||||
if api_playinfo_data.get('quality'):
|
||||
playinfos.append({'code': 0, 'message': '0', 'ttl': 1, 'data': api_playinfo_data})
|
||||
if not playinfos:
|
||||
log.w(message)
|
||||
# use bilibili error video instead
|
||||
url = 'https://static.hdslb.com/error.mp4'
|
||||
_, container, size = url_info(url)
|
||||
self.streams['flv480'] = {'container': container, 'size': size, 'src': [url]}
|
||||
return
|
||||
|
||||
def bangumi_entry(self, **kwargs):
|
||||
bangumi_id = re.search(r'(\d+)', self.url).group(1)
|
||||
frag = urllib.parse.urlparse(self.url).fragment
|
||||
if frag:
|
||||
episode_id = frag
|
||||
else:
|
||||
episode_id = re.search(r'first_ep_id\s*=\s*"(\d+)"', self.page) or re.search(r'\/ep(\d+)', self.url).group(1)
|
||||
# cont = post_content('http://bangumi.bilibili.com/web_api/get_source', post_data=dict(episode_id=episode_id))
|
||||
# cid = json.loads(cont)['result']['cid']
|
||||
cont = get_content('http://bangumi.bilibili.com/web_api/episode/{}.json'.format(episode_id))
|
||||
ep_info = json.loads(cont)['result']['currentEpisode']
|
||||
for playinfo in playinfos:
|
||||
quality = playinfo['data']['quality']
|
||||
format_id = self.stream_qualities[quality]['id']
|
||||
container = self.stream_qualities[quality]['container'].lower()
|
||||
desc = self.stream_qualities[quality]['desc']
|
||||
|
||||
bangumi_data = get_bangumi_info(str(ep_info['seasonId']))
|
||||
bangumi_payment = bangumi_data.get('payment')
|
||||
if bangumi_payment and bangumi_payment['price'] != '0':
|
||||
log.w("It's a paid item")
|
||||
# ep_ids = collect_bangumi_epids(bangumi_data)
|
||||
if 'durl' in playinfo['data']:
|
||||
src, size = [], 0
|
||||
for durl in playinfo['data']['durl']:
|
||||
src.append(durl['url'])
|
||||
size += durl['size']
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
|
||||
|
||||
index_title = ep_info['indexTitle']
|
||||
long_title = ep_info['longTitle'].strip()
|
||||
cid = ep_info['danmaku']
|
||||
# DASH formats
|
||||
if 'dash' in playinfo['data']:
|
||||
audio_size_cache = {}
|
||||
for video in playinfo['data']['dash']['video']:
|
||||
# prefer the latter codecs!
|
||||
s = self.stream_qualities[video['id']]
|
||||
format_id = 'dash-' + s['id'] # prefix
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
audio_quality = s['audio_quality']
|
||||
baseurl = video['baseUrl']
|
||||
size = self.url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
self.title = '{} [{} {}]'.format(self.title, index_title, long_title)
|
||||
self.download_by_vid(cid, bangumi=True, **kwargs)
|
||||
# find matching audio track
|
||||
audio_baseurl = playinfo['data']['dash']['audio'][0]['baseUrl']
|
||||
for audio in playinfo['data']['dash']['audio']:
|
||||
if int(audio['id']) == audio_quality:
|
||||
audio_baseurl = audio['baseUrl']
|
||||
break
|
||||
if not audio_size_cache.get(audio_quality, False):
|
||||
audio_size_cache[audio_quality] = self.url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
size += audio_size_cache[audio_quality]
|
||||
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl], [audio_baseurl]], 'size': size}
|
||||
|
||||
def check_oversea():
|
||||
url = 'https://interface.bilibili.com/player?id=cid:17778881'
|
||||
xml_lines = get_content(url).split('\n')
|
||||
for line in xml_lines:
|
||||
key = line.split('>')[0][1:]
|
||||
if key == 'country':
|
||||
value = line.split('>')[1].split('<')[0]
|
||||
if value != '中国':
|
||||
return True
|
||||
# get danmaku
|
||||
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
|
||||
|
||||
# bangumi
|
||||
elif sort == 'bangumi':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
|
||||
# warn if this bangumi has more than 1 video
|
||||
epn = len(initial_state['epList'])
|
||||
if epn > 1 and not kwargs.get('playlist'):
|
||||
log.w('This bangumi currently has %s videos. (use --playlist to download all videos.)' % epn)
|
||||
|
||||
# set video title
|
||||
self.title = initial_state['h1Title']
|
||||
|
||||
# construct playinfos
|
||||
ep_id = initial_state['epInfo']['id']
|
||||
avid = initial_state['epInfo']['aid']
|
||||
cid = initial_state['epInfo']['cid']
|
||||
playinfos = []
|
||||
api_url = self.bilibili_bangumi_api(avid, cid, ep_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
else:
|
||||
return False
|
||||
return False
|
||||
log.e(api_playinfo['message'])
|
||||
return
|
||||
current_quality = api_playinfo['result']['quality']
|
||||
# get alternative formats from API
|
||||
for qn in [80, 64, 32, 16]:
|
||||
# automatic format for durl: qn=0
|
||||
# for dash, qn does not matter
|
||||
if qn != current_quality:
|
||||
api_url = self.bilibili_bangumi_api(avid, cid, ep_id, qn=qn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
if api_playinfo['code'] == 0: # success
|
||||
playinfos.append(api_playinfo)
|
||||
|
||||
def check_sid():
|
||||
if not cookies:
|
||||
return False
|
||||
for cookie in cookies:
|
||||
if cookie.domain == '.bilibili.com' and cookie.name == 'sid':
|
||||
return True
|
||||
return False
|
||||
for playinfo in playinfos:
|
||||
if 'durl' in playinfo['result']:
|
||||
quality = playinfo['result']['quality']
|
||||
format_id = self.stream_qualities[quality]['id']
|
||||
container = self.stream_qualities[quality]['container'].lower()
|
||||
desc = self.stream_qualities[quality]['desc']
|
||||
|
||||
def fetch_sid(cid, aid):
|
||||
url = 'http://interface.bilibili.com/player?id=cid:{}&aid={}'.format(cid, aid)
|
||||
cookies = http.cookiejar.CookieJar()
|
||||
req = urllib.request.Request(url)
|
||||
res = urllib.request.urlopen(url)
|
||||
cookies.extract_cookies(res, req)
|
||||
for c in cookies:
|
||||
if c.domain == '.bilibili.com' and c.name == 'sid':
|
||||
return c.value
|
||||
raise
|
||||
src, size = [], 0
|
||||
for durl in playinfo['result']['durl']:
|
||||
src.append(durl['url'])
|
||||
size += durl['size']
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': src}
|
||||
|
||||
def collect_bangumi_epids(json_data):
|
||||
eps = json_data['episodes'][::-1]
|
||||
return [ep['episode_id'] for ep in eps]
|
||||
# DASH formats
|
||||
if 'dash' in playinfo['result']:
|
||||
for video in playinfo['result']['dash']['video']:
|
||||
# playinfo['result']['quality'] does not reflect the correct quality of DASH stream
|
||||
quality = self.height_to_quality(video['height']) # convert height to quality code
|
||||
s = self.stream_qualities[quality]
|
||||
format_id = 'dash-' + s['id'] # prefix
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
audio_quality = s['audio_quality']
|
||||
baseurl = video['baseUrl']
|
||||
size = url_size(baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
def get_bangumi_info(season_id):
|
||||
BASE_URL = 'http://bangumi.bilibili.com/jsonp/seasoninfo/'
|
||||
long_epoch = int(time.time() * 1000)
|
||||
req_url = BASE_URL + season_id + '.ver?callback=seasonListCallback&jsonp=jsonp&_=' + str(long_epoch)
|
||||
season_data = get_content(req_url)
|
||||
season_data = season_data[len('seasonListCallback('):]
|
||||
season_data = season_data[: -1 * len(');')]
|
||||
json_data = json.loads(season_data)
|
||||
return json_data['result']
|
||||
# find matching audio track
|
||||
audio_baseurl = playinfo['result']['dash']['audio'][0]['baseUrl']
|
||||
for audio in playinfo['result']['dash']['audio']:
|
||||
if int(audio['id']) == audio_quality:
|
||||
audio_baseurl = audio['baseUrl']
|
||||
break
|
||||
size += url_size(audio_baseurl, headers=self.bilibili_headers(referer=self.url))
|
||||
|
||||
def get_danmuku_xml(cid):
|
||||
return get_content('http://comment.bilibili.com/{}.xml'.format(cid))
|
||||
self.dash_streams[format_id] = {'container': container, 'quality': desc,
|
||||
'src': [[baseurl], [audio_baseurl]], 'size': size}
|
||||
|
||||
def parse_cid_playurl(xml):
|
||||
from xml.dom.minidom import parseString
|
||||
try:
|
||||
urls_list = []
|
||||
total_size = 0
|
||||
doc = parseString(xml.encode('utf-8'))
|
||||
durls = doc.getElementsByTagName('durl')
|
||||
cdn_cnt = len(durls[0].getElementsByTagName('url'))
|
||||
for i in range(cdn_cnt):
|
||||
urls_list.append([])
|
||||
for durl in durls:
|
||||
size = durl.getElementsByTagName('size')[0]
|
||||
total_size += int(size.firstChild.nodeValue)
|
||||
cnt = len(durl.getElementsByTagName('url'))
|
||||
for i in range(cnt):
|
||||
u = durl.getElementsByTagName('url')[i].firstChild.nodeValue
|
||||
urls_list[i].append(u)
|
||||
return urls_list, total_size
|
||||
except Exception as e:
|
||||
log.w(e)
|
||||
return [], 0
|
||||
# get danmaku
|
||||
self.danmaku = get_content('http://comment.bilibili.com/%s.xml' % cid)
|
||||
|
||||
def bilibili_download_playlist_by_url(url, **kwargs):
|
||||
url = url_locations([url])[0]
|
||||
# a bangumi here? possible?
|
||||
if 'live.bilibili' in url:
|
||||
site.download_by_url(url)
|
||||
elif 'bangumi.bilibili' in url:
|
||||
bangumi_id = re.search(r'(\d+)', url).group(1)
|
||||
bangumi_data = get_bangumi_info(bangumi_id)
|
||||
ep_ids = collect_bangumi_epids(bangumi_data)
|
||||
# vc video
|
||||
elif sort == 'vc':
|
||||
video_id = match1(self.url, r'https?://vc\.?bilibili\.com/video/(\d+)')
|
||||
api_url = self.bilibili_vc_api(video_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
api_playinfo = json.loads(api_content)
|
||||
|
||||
# set video title
|
||||
self.title = '%s (%s)' % (api_playinfo['data']['user']['name'], api_playinfo['data']['item']['id'])
|
||||
|
||||
height = api_playinfo['data']['item']['height']
|
||||
quality = self.height_to_quality(height) # convert height to quality code
|
||||
s = self.stream_qualities[quality]
|
||||
format_id = s['id']
|
||||
container = 'mp4' # enforce MP4 container
|
||||
desc = s['desc']
|
||||
|
||||
playurl = api_playinfo['data']['item']['video_playurl']
|
||||
size = int(api_playinfo['data']['item']['video_size'])
|
||||
|
||||
self.streams[format_id] = {'container': container, 'quality': desc, 'size': size, 'src': [playurl]}
|
||||
|
||||
# live
|
||||
elif sort == 'live':
|
||||
m = re.match(r'https?://live\.bilibili\.com/(\w+)', self.url)
|
||||
short_id = m.group(1)
|
||||
api_url = self.bilibili_live_room_init_api(short_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
room_init_info = json.loads(api_content)
|
||||
|
||||
room_id = room_init_info['data']['room_id']
|
||||
api_url = self.bilibili_live_room_info_api(room_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
room_info = json.loads(api_content)
|
||||
|
||||
# set video title
|
||||
self.title = room_info['data']['title'] + '.' + str(int(time.time()))
|
||||
|
||||
api_url = self.bilibili_live_api(room_id)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
video_info = json.loads(api_content)
|
||||
|
||||
durls = video_info['data']['durl']
|
||||
playurl = durls[0]['url']
|
||||
container = 'flv' # enforce FLV container
|
||||
self.streams['flv'] = {'container': container, 'quality': 'unknown',
|
||||
'size': 0, 'src': [playurl]}
|
||||
|
||||
# audio
|
||||
elif sort == 'audio':
|
||||
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/au(\d+)', self.url)
|
||||
sid = m.group(1)
|
||||
api_url = self.bilibili_audio_info_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
song_info = json.loads(api_content)
|
||||
|
||||
# set audio title
|
||||
self.title = song_info['data']['title']
|
||||
|
||||
# get lyrics
|
||||
self.lyrics = get_content(song_info['data']['lyric'])
|
||||
|
||||
api_url = self.bilibili_audio_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
audio_info = json.loads(api_content)
|
||||
|
||||
playurl = audio_info['data']['cdns'][0]
|
||||
size = audio_info['data']['size']
|
||||
container = 'mp4' # enforce MP4 container
|
||||
self.streams['mp4'] = {'container': container,
|
||||
'size': size, 'src': [playurl]}
|
||||
|
||||
def extract(self, **kwargs):
|
||||
# set UA and referer for downloading
|
||||
headers = self.bilibili_headers(referer=self.url)
|
||||
self.ua, self.referer = headers['User-Agent'], headers['Referer']
|
||||
|
||||
if not self.streams_sorted:
|
||||
# no stream is available
|
||||
return
|
||||
|
||||
if 'stream_id' in kwargs and kwargs['stream_id']:
|
||||
# extract the stream
|
||||
stream_id = kwargs['stream_id']
|
||||
if stream_id not in self.streams and stream_id not in self.dash_streams:
|
||||
log.e('[Error] Invalid video format.')
|
||||
log.e('Run \'-i\' command with no specific video format to view all available formats.')
|
||||
exit(2)
|
||||
else:
|
||||
# extract stream with the best quality
|
||||
stream_id = self.streams_sorted[0]['id']
|
||||
|
||||
def download_playlist_by_url(self, url, **kwargs):
|
||||
self.url = url
|
||||
kwargs['playlist'] = True
|
||||
|
||||
html_content = get_content(self.url, headers=self.bilibili_headers())
|
||||
|
||||
# sort it out
|
||||
if re.match(r'https?://(www\.)?bilibili\.com/bangumi/play/ep(\d+)', self.url):
|
||||
sort = 'bangumi'
|
||||
elif match1(html_content, r'<meta property="og:url" content="(https://www.bilibili.com/bangumi/play/[^"]+)"'):
|
||||
sort = 'bangumi'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/bangumi/media/md(\d+)', self.url) or \
|
||||
re.match(r'https?://bangumi\.bilibili\.com/anime/(\d+)', self.url):
|
||||
sort = 'bangumi_md'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/video/av(\d+)', self.url):
|
||||
sort = 'video'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url):
|
||||
sort = 'space_channel'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url):
|
||||
sort = 'space_favlist'
|
||||
elif re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url):
|
||||
sort = 'space_video'
|
||||
elif re.match(r'https?://(www\.)?bilibili\.com/audio/am(\d+)', self.url):
|
||||
sort = 'audio_menu'
|
||||
else:
|
||||
log.e('[Error] Unsupported URL pattern.')
|
||||
exit(1)
|
||||
|
||||
# regular av video
|
||||
if sort == 'video':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
aid = initial_state['videoData']['aid']
|
||||
pn = initial_state['videoData']['videos']
|
||||
for pi in range(1, pn + 1):
|
||||
purl = 'https://www.bilibili.com/video/av%s?p=%s' % (aid, pi)
|
||||
self.__class__().download_by_url(purl, **kwargs)
|
||||
|
||||
elif sort == 'bangumi':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
epn, i = len(initial_state['epList']), 0
|
||||
for ep in initial_state['epList']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
ep_id = ep['id']
|
||||
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
|
||||
self.__class__().download_by_url(epurl, **kwargs)
|
||||
|
||||
elif sort == 'bangumi_md':
|
||||
initial_state_text = match1(html_content, r'__INITIAL_STATE__=(.*?);\(function\(\)') # FIXME
|
||||
initial_state = json.loads(initial_state_text)
|
||||
epn, i = len(initial_state['mediaInfo']['episodes']), 0
|
||||
for ep in initial_state['mediaInfo']['episodes']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
ep_id = ep['ep_id']
|
||||
epurl = 'https://www.bilibili.com/bangumi/play/ep%s/' % ep_id
|
||||
self.__class__().download_by_url(epurl, **kwargs)
|
||||
|
||||
elif sort == 'space_channel':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/channel/detail\?.*cid=(\d+)', self.url)
|
||||
mid, cid = m.group(1), m.group(2)
|
||||
api_url = self.bilibili_space_channel_api(mid, cid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
channel_info = json.loads(api_content)
|
||||
# TBD: channel of more than 100 videos
|
||||
|
||||
epn, i = len(channel_info['data']['list']['archives']), 0
|
||||
for video in channel_info['data']['list']['archives']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['aid']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'space_favlist':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/favlist\?.*fid=(\d+)', self.url)
|
||||
vmid, fid = m.group(1), m.group(2)
|
||||
api_url = self.bilibili_space_favlist_api(vmid, fid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
favlist_info = json.loads(api_content)
|
||||
pc = favlist_info['data']['pagecount']
|
||||
|
||||
for pn in range(1, pc + 1):
|
||||
api_url = self.bilibili_space_favlist_api(vmid, fid, pn=pn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers(referer=self.url))
|
||||
favlist_info = json.loads(api_content)
|
||||
|
||||
epn, i = len(favlist_info['data']['archives']), 0
|
||||
for video in favlist_info['data']['archives']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['aid']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'space_video':
|
||||
m = re.match(r'https?://space\.?bilibili\.com/(\d+)/video', self.url)
|
||||
mid = m.group(1)
|
||||
api_url = self.bilibili_space_video_api(mid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
videos_info = json.loads(api_content)
|
||||
pc = videos_info['data']['pages']
|
||||
|
||||
for pn in range(1, pc + 1):
|
||||
api_url = self.bilibili_space_video_api(mid, pn=pn)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
videos_info = json.loads(api_content)
|
||||
|
||||
epn, i = len(videos_info['data']['vlist']), 0
|
||||
for video in videos_info['data']['vlist']:
|
||||
i += 1; log.w('Extracting %s of %s videos ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/video/av%s' % video['aid']
|
||||
self.__class__().download_playlist_by_url(url, **kwargs)
|
||||
|
||||
elif sort == 'audio_menu':
|
||||
m = re.match(r'https?://(?:www\.)?bilibili\.com/audio/am(\d+)', self.url)
|
||||
sid = m.group(1)
|
||||
#api_url = self.bilibili_audio_menu_info_api(sid)
|
||||
#api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
#menu_info = json.loads(api_content)
|
||||
api_url = self.bilibili_audio_menu_song_api(sid)
|
||||
api_content = get_content(api_url, headers=self.bilibili_headers())
|
||||
menusong_info = json.loads(api_content)
|
||||
epn, i = len(menusong_info['data']['data']), 0
|
||||
for song in menusong_info['data']['data']:
|
||||
i += 1; log.w('Extracting %s of %s songs ...' % (i, epn))
|
||||
url = 'https://www.bilibili.com/audio/au%s' % song['id']
|
||||
self.__class__().download_by_url(url, **kwargs)
|
||||
|
||||
base_url = url.split('#')[0]
|
||||
for ep_id in ep_ids:
|
||||
ep_url = '#'.join([base_url, ep_id])
|
||||
Bilibili().download_by_url(ep_url, **kwargs)
|
||||
else:
|
||||
aid = re.search(r'av(\d+)', url).group(1)
|
||||
page_list = json.loads(get_content('http://www.bilibili.com/widget/getPageList?aid={}'.format(aid)))
|
||||
page_cnt = len(page_list)
|
||||
for no in range(1, page_cnt+1):
|
||||
page_url = 'http://www.bilibili.com/video/av{}/index_{}.html'.format(aid, no)
|
||||
subtitle = page_list[no-1]['pagename']
|
||||
Bilibili().download_by_url(page_url, subtitle=subtitle, **kwargs)
|
||||
|
||||
site = Bilibili()
|
||||
download = site.download_by_url
|
||||
download_playlist = bilibili_download_playlist_by_url
|
||||
download_playlist = site.download_playlist_by_url
|
||||
|
||||
bilibili_download = download
|
||||
|
@ -25,10 +25,10 @@ def coub_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
loop_file_path = get_loop_file_path(title, output_dir)
|
||||
single_file_path = audio_file_path
|
||||
if audio_duration > video_duration:
|
||||
write_loop_file(int(audio_duration / video_duration), loop_file_path, video_file_name)
|
||||
write_loop_file(round(audio_duration / video_duration), loop_file_path, video_file_name)
|
||||
else:
|
||||
single_file_path = audio_file_path
|
||||
write_loop_file(int(video_duration / audio_duration), loop_file_path, audio_file_name)
|
||||
write_loop_file(round(video_duration / audio_duration), loop_file_path, audio_file_name)
|
||||
|
||||
ffmpeg.ffmpeg_concat_audio_and_video([loop_file_path, single_file_path], title + "_full", "mp4")
|
||||
cleanup_files([video_file_path, audio_file_path, loop_file_path])
|
||||
|
@ -1,89 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['dilidili_download']
|
||||
|
||||
from ..common import *
|
||||
from .ckplayer import ckplayer_download
|
||||
|
||||
headers = {
|
||||
'DNT': '1',
|
||||
'Accept-Encoding': 'gzip, deflate, sdch, br',
|
||||
'Accept-Language': 'en-CA,en;q=0.8,en-US;q=0.6,zh-CN;q=0.4,zh;q=0.2',
|
||||
'Upgrade-Insecure-Requests': '1',
|
||||
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
|
||||
'Cache-Control': 'max-age=0',
|
||||
'Referer': 'http://www.dilidili.com/',
|
||||
'Connection': 'keep-alive',
|
||||
'Save-Data': 'on',
|
||||
}
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def dilidili_parser_data_to_stream_types(typ ,vid ,hd2 ,sign, tmsign, ulk):
|
||||
"""->list"""
|
||||
another_url = 'https://newplayer.jfrft.com/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
html = get_content(another_url, headers=headers)
|
||||
|
||||
info = re.search(r'(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})(\{[^{]+\})', html).groups()
|
||||
info = [i.strip('{}').split('->') for i in info]
|
||||
info = {i[0]: i [1] for i in info}
|
||||
|
||||
stream_types = []
|
||||
for i in zip(info['deft'].split('|'), info['defa'].split('|')):
|
||||
stream_types.append({'id': str(i[1][-1]), 'container': 'mp4', 'video_profile': i[0]})
|
||||
return stream_types
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def dilidili_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
global headers
|
||||
re_str = r'http://www.dilidili.com/watch\S+'
|
||||
if re.match(r'http://www.dilidili.wang', url):
|
||||
re_str = r'http://www.dilidili.wang/watch\S+'
|
||||
headers['Referer'] = 'http://www.dilidili.wang/'
|
||||
elif re.match(r'http://www.dilidili.mobi', url):
|
||||
re_str = r'http://www.dilidili.mobi/watch\S+'
|
||||
headers['Referer'] = 'http://www.dilidili.mobi/'
|
||||
|
||||
if re.match(re_str, url):
|
||||
html = get_content(url)
|
||||
title = match1(html, r'<title>(.+)丨(.+)</title>') #title
|
||||
|
||||
# player loaded via internal iframe
|
||||
frame_url = re.search(r'<iframe src=\"(.+?)\"', html).group(1)
|
||||
logging.debug('dilidili_download: %s' % frame_url)
|
||||
|
||||
#https://player.005.tv:60000/?vid=a8760f03fd:a04808d307&v=yun&sign=a68f8110cacd892bc5b094c8e5348432
|
||||
html = get_content(frame_url, headers=headers, decoded=False).decode('utf-8')
|
||||
|
||||
match = re.search(r'(.+?)var video =(.+?);', html)
|
||||
vid = match1(html, r'var vid="(.+)"')
|
||||
hd2 = match1(html, r'var hd2="(.+)"')
|
||||
typ = match1(html, r'var typ="(.+)"')
|
||||
sign = match1(html, r'var sign="(.+)"')
|
||||
tmsign = match1(html, r'tmsign=([A-Za-z0-9]+)')
|
||||
ulk = match1(html, r'var ulk="(.+)"')
|
||||
|
||||
# here s the parser...
|
||||
stream_types = dilidili_parser_data_to_stream_types(typ, vid, hd2, sign, tmsign, ulk)
|
||||
|
||||
#get best
|
||||
best_id = max([i['id'] for i in stream_types])
|
||||
|
||||
parse_url = 'http://player.005.tv/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = best_id, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
|
||||
another_url = 'https://newplayer.jfrft.com/parse.php?xmlurl=null&type={typ}&vid={vid}&hd={hd2}&sign={sign}&tmsign={tmsign}&userlink={ulk}'.format(typ = typ, vid = vid, hd2 = hd2, sign = sign, tmsign = tmsign, ulk = ulk)
|
||||
|
||||
ckplayer_download(another_url, output_dir, merge, info_only, is_xml = True, title = title, headers = headers)
|
||||
|
||||
#type_ = ''
|
||||
#size = 0
|
||||
|
||||
#type_, ext, size = url_info(url)
|
||||
#print_info(site_info, title, type_, size)
|
||||
#if not info_only:
|
||||
#download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "dilidili"
|
||||
download = dilidili_download
|
||||
download_playlist = playlist_not_supported('dilidili')
|
@ -7,6 +7,7 @@ from ..common import (
|
||||
url_size,
|
||||
print_info,
|
||||
get_content,
|
||||
fake_headers,
|
||||
download_urls,
|
||||
playlist_not_supported,
|
||||
)
|
||||
@ -16,13 +17,19 @@ __all__ = ['douyin_download_by_url']
|
||||
|
||||
|
||||
def douyin_download_by_url(url, **kwargs):
|
||||
page_content = get_content(url)
|
||||
page_content = get_content(url, headers=fake_headers)
|
||||
match_rule = re.compile(r'var data = \[(.*?)\];')
|
||||
video_info = json.loads(match_rule.findall(page_content)[0])
|
||||
video_url = video_info['video']['play_addr']['url_list'][0]
|
||||
title = video_info['cha_list'][0]['cha_name']
|
||||
# fix: https://www.douyin.com/share/video/6553248251821165832
|
||||
# if there is no title, use desc
|
||||
cha_list = video_info['cha_list']
|
||||
if cha_list:
|
||||
title = cha_list[0]['cha_name']
|
||||
else:
|
||||
title = video_info['desc']
|
||||
video_format = 'mp4'
|
||||
size = url_size(video_url)
|
||||
size = url_size(video_url, faker=True)
|
||||
print_info(
|
||||
site_info='douyin.com', title=title,
|
||||
type=video_format, size=size
|
||||
@ -30,6 +37,7 @@ def douyin_download_by_url(url, **kwargs):
|
||||
if not kwargs['info_only']:
|
||||
download_urls(
|
||||
urls=[video_url], title=title, ext=video_format, total_size=size,
|
||||
faker=True,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
|
@ -9,6 +9,10 @@ import hashlib
|
||||
import time
|
||||
import re
|
||||
|
||||
headers = {
|
||||
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
|
||||
}
|
||||
|
||||
def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ep = 'http://vmobile.douyu.com/video/getInfo?vid='
|
||||
patt = r'show/([0-9A-Za-z]+)'
|
||||
@ -19,7 +23,7 @@ def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **k
|
||||
log.wtf('Unknown url pattern')
|
||||
vid = hit.group(1)
|
||||
|
||||
page = get_content(url)
|
||||
page = get_content(url, headers=headers)
|
||||
hit = re.search(title_patt, page)
|
||||
if hit is None:
|
||||
title = vid
|
||||
@ -35,21 +39,18 @@ def douyutv_video_download(url, output_dir='.', merge=True, info_only=False, **k
|
||||
urls = general_m3u8_extractor(m3u8_url)
|
||||
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
|
||||
def douyutv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if 'v.douyu.com/show/' in url:
|
||||
douyutv_video_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
headers = {
|
||||
'user-agent': 'Mozilla/5.0 (iPad; CPU OS 8_1_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B466 Safari/600.1.4'
|
||||
}
|
||||
|
||||
url = re.sub(r'[w.]*douyu.com','m.douyu.com',url)
|
||||
url = re.sub(r'.*douyu.com','https://m.douyu.com/room', url)
|
||||
html = get_content(url, headers)
|
||||
room_id_patt = r'room_id\s*:\s*(\d+),'
|
||||
room_id_patt = r'"rid"\s*:\s*(\d+),'
|
||||
room_id = match1(html, room_id_patt)
|
||||
if room_id == "0":
|
||||
room_id = url[url.rfind('/')+1:]
|
||||
room_id = url[url.rfind('/') + 1:]
|
||||
|
||||
api_url = "http://www.douyutv.com/api/v1/"
|
||||
args = "room/%s?aid=wp&client_sys=wp&time=%d" % (room_id, int(time.time()))
|
||||
@ -60,20 +61,21 @@ def douyutv_download(url, output_dir = '.', merge = True, info_only = False, **k
|
||||
content = get_content(json_request_url, headers)
|
||||
json_content = json.loads(content)
|
||||
data = json_content['data']
|
||||
server_status = json_content.get('error',0)
|
||||
if server_status is not 0:
|
||||
server_status = json_content.get('error', 0)
|
||||
if server_status != 0:
|
||||
raise ValueError("Server returned error:%s" % server_status)
|
||||
|
||||
title = data.get('room_name')
|
||||
show_status = data.get('show_status')
|
||||
if show_status is not "1":
|
||||
if show_status != "1":
|
||||
raise ValueError("The live stream is not online! (Errno:%s)" % server_status)
|
||||
|
||||
real_url = data.get('rtmp_url') + '/' + data.get('rtmp_live')
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir = output_dir, merge = merge)
|
||||
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
site_info = "douyu.com"
|
||||
download = douyutv_download
|
||||
|
@ -67,7 +67,7 @@ bokecc_patterns = [r'bokecc\.com/flash/pocle/player\.swf\?siteid=(.+?)&vid=(.{32
|
||||
recur_limit = 3
|
||||
|
||||
|
||||
def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwargs):
|
||||
def embed_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
content = get_content(url, headers=fake_headers)
|
||||
found = False
|
||||
title = match1(content, '<title>([^<>]+)</title>')
|
||||
@ -75,43 +75,43 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
|
||||
vids = matchall(content, youku_embed_patterns)
|
||||
for vid in set(vids):
|
||||
found = True
|
||||
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
youku_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, tudou_embed_patterns)
|
||||
for vid in set(vids):
|
||||
found = True
|
||||
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
tudou_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, yinyuetai_embed_patterns)
|
||||
for vid in vids:
|
||||
found = True
|
||||
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
yinyuetai_download_by_id(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
vids = matchall(content, iqiyi_embed_patterns)
|
||||
for vid in vids:
|
||||
found = True
|
||||
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
iqiyi_download_by_vid((vid[1], vid[0]), title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
urls = matchall(content, netease_embed_patterns)
|
||||
for url in urls:
|
||||
found = True
|
||||
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
netease_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
urls = matchall(content, vimeo_embed_patters)
|
||||
for url in urls:
|
||||
found = True
|
||||
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url)
|
||||
vimeo_download_by_id(url, title=title, output_dir=output_dir, merge=merge, info_only=info_only, referer=url, **kwargs)
|
||||
|
||||
urls = matchall(content, dailymotion_embed_patterns)
|
||||
for url in urls:
|
||||
found = True
|
||||
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
dailymotion_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
aids = matchall(content, bilibili_embed_patterns)
|
||||
for aid in aids:
|
||||
found = True
|
||||
url = 'http://www.bilibili.com/video/av%s/' % aid
|
||||
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
bilibili_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
iqiyi_urls = matchall(content, iqiyi_patterns)
|
||||
for url in iqiyi_urls:
|
||||
@ -133,7 +133,7 @@ def embed_download(url, output_dir = '.', merge = True, info_only = False ,**kwa
|
||||
r = 1
|
||||
else:
|
||||
r += 1
|
||||
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.+?)(?:\"|\')'])
|
||||
iframes = matchall(content, [r'<iframe.+?src=(?:\"|\')(.*?)(?:\"|\')'])
|
||||
for iframe in iframes:
|
||||
if not iframe.startswith('http'):
|
||||
src = urllib.parse.urljoin(url, iframe)
|
||||
|
@ -6,6 +6,7 @@ from ..common import *
|
||||
import json
|
||||
|
||||
def facebook_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
url = re.sub(r'//.*?facebook.com','//facebook.com',url)
|
||||
html = get_html(url)
|
||||
|
||||
title = r1(r'<title id="pageTitle">(.+)</title>', html)
|
||||
|
@ -1,54 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['fantasy_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
import random
|
||||
from urllib.parse import urlparse, parse_qs
|
||||
|
||||
|
||||
def fantasy_download_by_id_channelId(id = 0, channelId = 0, output_dir = '.', merge = True, info_only = False,
|
||||
**kwargs):
|
||||
api_url = 'http://www.fantasy.tv/tv/playDetails.action?' \
|
||||
'myChannelId=1&id={id}&channelId={channelId}&t={t}'.format(id = id,
|
||||
channelId = channelId,
|
||||
t = str(random.random())
|
||||
)
|
||||
html = get_content(api_url)
|
||||
html = json.loads(html)
|
||||
|
||||
if int(html['status']) != 100000:
|
||||
raise Exception('API error!')
|
||||
|
||||
title = html['data']['tv']['title']
|
||||
|
||||
video_url = html['data']['tv']['videoPath']
|
||||
headers = fake_headers.copy()
|
||||
headers['Referer'] = api_url
|
||||
type, ext, size = url_info(video_url, headers=headers)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, ext, size, output_dir, merge = merge, headers = headers)
|
||||
|
||||
|
||||
def fantasy_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
if 'fantasy.tv' not in url:
|
||||
raise Exception('Wrong place!')
|
||||
|
||||
q = parse_qs(urlparse(url).query)
|
||||
|
||||
if 'tvId' not in q or 'channelId' not in q:
|
||||
raise Exception('No enough arguments!')
|
||||
|
||||
tvId = q['tvId'][0]
|
||||
channelId = q['channelId'][0]
|
||||
|
||||
fantasy_download_by_id_channelId(id = tvId, channelId = channelId, output_dir = output_dir, merge = merge,
|
||||
info_only = info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "fantasy.tv"
|
||||
download = fantasy_download
|
||||
download_playlist = playlist_not_supported('fantasy.tv')
|
@ -74,7 +74,7 @@ def get_api_key(page):
|
||||
# this happens only when the url points to a gallery page
|
||||
# that contains no inline api_key(and never makes xhr api calls)
|
||||
# in fact this might be a better approch for getting a temporary api key
|
||||
# since there's no place for a user to add custom infomation that may
|
||||
# since there's no place for a user to add custom information that may
|
||||
# misguide the regex in the homepage
|
||||
if not match:
|
||||
return match1(get_html('https://flickr.com'), pattern_inline_api_key)
|
||||
|
@ -59,7 +59,7 @@ def google_download(url, output_dir = '.', merge = True, info_only = False, **kw
|
||||
u = '/'.join(t)
|
||||
real_urls.append(u)
|
||||
if not real_urls:
|
||||
real_urls = [r1(r'<meta property="og:image" content="([^"]+)', html)]
|
||||
real_urls = re.findall(r'<meta property="og:image" content="([^"]+)', html)
|
||||
real_urls = [re.sub(r'w\d+-h\d+-p', 's0', u) for u in real_urls]
|
||||
post_date = r1(r'"?(20\d\d[-/]?[01]\d[-/]?[0123]\d)"?', html)
|
||||
post_id = r1(r'/posts/([^"]+)', html)
|
||||
|
@ -1,85 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import math
|
||||
import traceback
|
||||
import urllib.parse as urlparse
|
||||
|
||||
from ..common import *
|
||||
|
||||
__all__ = ['huaban_download']
|
||||
|
||||
site_info = '花瓣 (Huaban)'
|
||||
|
||||
LIMIT = 100
|
||||
|
||||
|
||||
class Board:
|
||||
def __init__(self, title, pins):
|
||||
self.title = title
|
||||
self.pins = pins
|
||||
self.pin_count = len(pins)
|
||||
|
||||
|
||||
class Pin:
|
||||
host = 'http://img.hb.aicdn.com/'
|
||||
|
||||
def __init__(self, pin_json):
|
||||
img_file = pin_json['file']
|
||||
self.id = str(pin_json['pin_id'])
|
||||
self.url = urlparse.urljoin(self.host, img_file['key'])
|
||||
self.ext = img_file['type'].split('/')[-1]
|
||||
|
||||
|
||||
def construct_url(url, **params):
|
||||
param_str = urlparse.urlencode(params)
|
||||
return url + '?' + param_str
|
||||
|
||||
|
||||
def extract_json_data(url, **params):
|
||||
url = construct_url(url, **params)
|
||||
html = get_content(url, headers=fake_headers)
|
||||
json_string = match1(html, r'app.page\["board"\] = (.*?});')
|
||||
json_data = json.loads(json_string)
|
||||
return json_data
|
||||
|
||||
|
||||
def extract_board_data(url):
|
||||
json_data = extract_json_data(url, limit=LIMIT)
|
||||
pin_list = json_data['pins']
|
||||
title = json_data['title']
|
||||
pin_count = json_data['pin_count']
|
||||
pin_count -= len(pin_list)
|
||||
|
||||
while pin_count > 0:
|
||||
json_data = extract_json_data(url, max=pin_list[-1]['pin_id'],
|
||||
limit=LIMIT)
|
||||
pins = json_data['pins']
|
||||
pin_list += pins
|
||||
pin_count -= len(pins)
|
||||
|
||||
return Board(title, list(map(Pin, pin_list)))
|
||||
|
||||
|
||||
def huaban_download_board(url, output_dir, **kwargs):
|
||||
kwargs['merge'] = False
|
||||
board = extract_board_data(url)
|
||||
output_dir = os.path.join(output_dir, board.title)
|
||||
print_info(site_info, board.title, 'jpg', float('Inf'))
|
||||
for pin in board.pins:
|
||||
download_urls([pin.url], pin.id, pin.ext, float('Inf'),
|
||||
output_dir=output_dir, faker=True, **kwargs)
|
||||
|
||||
|
||||
def huaban_download(url, output_dir='.', **kwargs):
|
||||
if re.match(r'http://huaban\.com/boards/\d+/', url):
|
||||
huaban_download_board(url, output_dir, **kwargs)
|
||||
else:
|
||||
print('Only board (画板) pages are supported currently')
|
||||
print('ex: http://huaban.com/boards/12345678/')
|
||||
|
||||
|
||||
download = huaban_download
|
||||
download_playlist = playlist_not_supported("huaban")
|
@ -110,7 +110,7 @@ def icourses_playlist_download(url, output_dir='.', **kwargs):
|
||||
video_list = re.findall(resid_courseid_patt, page)
|
||||
|
||||
if not video_list:
|
||||
raise Exception('Unkown url pattern')
|
||||
raise Exception('Unknown url pattern')
|
||||
|
||||
for video in video_list:
|
||||
video_url = change_for_video_ip.format(video[0], video[1])
|
||||
|
@ -27,8 +27,11 @@ def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
for edge in edges:
|
||||
title = edge['node']['shortcode']
|
||||
image_url = edge['node']['display_url']
|
||||
ext = image_url.split('.')[-1]
|
||||
if 'video_url' in edge['node']:
|
||||
image_url = edge['node']['video_url']
|
||||
ext = image_url.split('?')[0].split('.')[-1]
|
||||
size = int(get_head(image_url)['Content-Length'])
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls=[image_url],
|
||||
@ -39,8 +42,11 @@ def instagram_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
else:
|
||||
title = info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['shortcode']
|
||||
image_url = info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['display_url']
|
||||
ext = image_url.split('.')[-1]
|
||||
if 'video_url' in info['entry_data']['PostPage'][0]['graphql']['shortcode_media']:
|
||||
image_url =info['entry_data']['PostPage'][0]['graphql']['shortcode_media']['video_url']
|
||||
ext = image_url.split('?')[0].split('.')[-1]
|
||||
size = int(get_head(image_url)['Content-Length'])
|
||||
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls=[image_url],
|
||||
|
@ -136,12 +136,9 @@ class Iqiyi(VideoExtractor):
|
||||
r1(r'vid=([^&]+)', self.url) or \
|
||||
r1(r'data-player-videoid="([^"]+)"', html) or r1(r'vid=(.+?)\&', html) or r1(r'param\[\'vid\'\]\s*=\s*"(.+?)"', html)
|
||||
self.vid = (tvid, videoid)
|
||||
info_u = 'http://mixer.video.iqiyi.com/jp/mixin/videos/' + tvid
|
||||
mixin = get_content(info_u)
|
||||
mixin_json = json.loads(mixin[len('var tvInfoJs='):])
|
||||
real_u = mixin_json['url']
|
||||
real_html = get_content(real_u)
|
||||
self.title = match1(real_html, '<title>([^<]+)').split('-')[0]
|
||||
info_u = 'http://pcw-api.iqiyi.com/video/video/playervideoinfo?tvid=' + tvid
|
||||
json_res = get_content(info_u)
|
||||
self.title = json.loads(json_res)['data']['vn']
|
||||
tvid, videoid = self.vid
|
||||
info = getVMS(tvid, videoid)
|
||||
assert info['code'] == 'A00000', "can't play this video"
|
||||
|
@ -17,20 +17,20 @@ headers = {
|
||||
|
||||
def iwara_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
global headers
|
||||
video_hash=match1(url, r'http://\w+.iwara.tv/videos/(\w+)')
|
||||
video_url=match1(url, r'(http://\w+.iwara.tv)/videos/\w+')
|
||||
html = get_content(url,headers=headers)
|
||||
video_hash = match1(url, r'https?://\w+.iwara.tv/videos/(\w+)')
|
||||
video_url = match1(url, r'(https?://\w+.iwara.tv)/videos/\w+')
|
||||
html = get_content(url, headers=headers)
|
||||
title = r1(r'<title>(.*)</title>', html)
|
||||
api_url=video_url+'/api/video/'+video_hash
|
||||
content=get_content(api_url,headers=headers)
|
||||
data=json.loads(content)
|
||||
type,ext,size=url_info(data[0]['uri'], headers=headers)
|
||||
down_urls=data[0]['uri']
|
||||
print_info(down_urls,title+data[0]['resolution'],type,size)
|
||||
api_url = video_url + '/api/video/' + video_hash
|
||||
content = get_content(api_url, headers=headers)
|
||||
data = json.loads(content)
|
||||
down_urls = 'https:' + data[0]['uri']
|
||||
type, ext, size = url_info(down_urls, headers=headers)
|
||||
print_info(site_info, title+data[0]['resolution'], type, size)
|
||||
|
||||
if not info_only:
|
||||
download_urls([down_urls], title, ext, size, output_dir, merge = merge,headers=headers)
|
||||
download_urls([down_urls], title, ext, size, output_dir, merge=merge, headers=headers)
|
||||
|
||||
site_info = "iwara"
|
||||
site_info = "Iwara"
|
||||
download = iwara_download
|
||||
download_playlist = playlist_not_supported('iwara')
|
||||
|
@ -1,85 +1,132 @@
|
||||
#!/usr/bin/env python
|
||||
__all__ = ['ixigua_download', 'ixigua_download_playlist']
|
||||
import base64
|
||||
import random
|
||||
|
||||
import binascii
|
||||
|
||||
from ..common import *
|
||||
import random
|
||||
import ctypes
|
||||
from json import loads
|
||||
|
||||
def get_video_id(text):
|
||||
re_id = r"videoId: '(.*?)'"
|
||||
return re.findall(re_id, text)[0]
|
||||
__all__ = ['ixigua_download', 'ixigua_download_playlist_by_url']
|
||||
|
||||
def get_r():
|
||||
return str(random.random())[2:]
|
||||
headers = {
|
||||
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 "
|
||||
"Safari/537.36",
|
||||
}
|
||||
|
||||
def right_shift(val, n):
|
||||
return val >> n if val >= 0 else (val + 0x100000000) >> n
|
||||
|
||||
def get_s(text):
|
||||
"""get video info"""
|
||||
id = get_video_id(text)
|
||||
p = get_r()
|
||||
url = 'http://i.snssdk.com/video/urls/v/1/toutiao/mp4/%s' % id
|
||||
n = parse.urlparse(url).path + '?r=%s' % p
|
||||
c = binascii.crc32(n.encode('utf-8'))
|
||||
s = right_shift(c, 0)
|
||||
title = ''.join(re.findall(r"title: '(.*?)',", text))
|
||||
return url + '?r=%s&s=%s' % (p, s), title
|
||||
def int_overflow(val):
|
||||
maxint = 2147483647
|
||||
if not -maxint - 1 <= val <= maxint:
|
||||
val = (val + (maxint + 1)) % (2 * (maxint + 1)) - maxint - 1
|
||||
return val
|
||||
|
||||
def get_moment(url, user_id, base_url, video_list):
|
||||
"""Recursively obtaining a video list"""
|
||||
video_list_data = json.loads(get_content(url))
|
||||
if not video_list_data['next']['max_behot_time']:
|
||||
return video_list
|
||||
[video_list.append(i["display_url"]) for i in video_list_data["data"]]
|
||||
max_behot_time = video_list_data['next']['max_behot_time']
|
||||
_param = {
|
||||
'user_id': user_id,
|
||||
'base_url': base_url,
|
||||
'video_list': video_list,
|
||||
'url': base_url.format(user_id=user_id, max_behot_time=max_behot_time),
|
||||
}
|
||||
return get_moment(**_param)
|
||||
|
||||
def ixigua_download(url, output_dir='.', info_only=False, **kwargs):
|
||||
""" Download a single video
|
||||
Sample URL: https://www.ixigua.com/a6487187567887254029/#mid=59051127876
|
||||
"""
|
||||
try:
|
||||
video_info_url, title = get_s(get_content(url))
|
||||
video_info = json.loads(get_content(video_info_url))
|
||||
except Exception:
|
||||
raise NotImplementedError(url)
|
||||
try:
|
||||
video_url = base64.b64decode(video_info["data"]["video_list"]["video_1"]["main_url"]).decode()
|
||||
except Exception:
|
||||
raise NotImplementedError(url)
|
||||
filetype, ext, size = url_info(video_url)
|
||||
print_info(site_info, title, filetype, size)
|
||||
def unsigned_right_shitf(n, i):
|
||||
if n < 0:
|
||||
n = ctypes.c_uint32(n).value
|
||||
if i < 0:
|
||||
return -int_overflow(n << abs(i))
|
||||
return int_overflow(n >> i)
|
||||
|
||||
|
||||
def get_video_url_from_video_id(video_id):
|
||||
"""Splicing URLs according to video ID to get video details"""
|
||||
# from js
|
||||
data = [""] * 256
|
||||
for index, _ in enumerate(data):
|
||||
t = index
|
||||
for i in range(8):
|
||||
t = -306674912 ^ unsigned_right_shitf(t, 1) if 1 & t else unsigned_right_shitf(t, 1)
|
||||
data[index] = t
|
||||
|
||||
def tmp():
|
||||
rand_num = random.random()
|
||||
path = "/video/urls/v/1/toutiao/mp4/{video_id}?r={random_num}".format(video_id=video_id,
|
||||
random_num=str(rand_num)[2:])
|
||||
e = o = r = -1
|
||||
i, a = 0, len(path)
|
||||
while i < a:
|
||||
e = ord(path[i])
|
||||
i += 1
|
||||
if e < 128:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ e)]
|
||||
else:
|
||||
if e < 2048:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (192 | e >> 6 & 31))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
|
||||
else:
|
||||
if 55296 <= e < 57344:
|
||||
e = (1023 & e) + 64
|
||||
i += 1
|
||||
o = 1023 & t.url(i)
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (240 | e >> 8 & 7))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 2 & 63))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | o >> 6 & 15 | (3 & e) << 4))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & o))]
|
||||
else:
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (224 | e >> 12 & 15))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | e >> 6 & 63))]
|
||||
r = unsigned_right_shitf(r, 8) ^ data[255 & (r ^ (128 | 63 & e))]
|
||||
|
||||
return "https://ib.365yg.com{path}&s={param}".format(path=path, param=unsigned_right_shitf(r ^ -1, 0))
|
||||
|
||||
while 1:
|
||||
url = tmp()
|
||||
if url.split("=")[-1][0] != "-": # 参数s不能为负数
|
||||
return url
|
||||
|
||||
|
||||
def ixigua_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
# example url: https://www.ixigua.com/i6631065141750268420/#mid=63024814422
|
||||
html = get_html(url, faker=True)
|
||||
video_id = match1(html, r"\"vid\":\"([^\"]+)")
|
||||
title = match1(html, r"\"title\":\"(\S+?)\",")
|
||||
if not video_id:
|
||||
log.e("video_id not found, url:{}".format(url))
|
||||
return
|
||||
video_info_url = get_video_url_from_video_id(video_id)
|
||||
video_info = loads(get_content(video_info_url))
|
||||
if video_info.get("code", 1) != 0:
|
||||
log.e("Get video info from {} error: server return code {}".format(video_info_url, video_info.get("code", 1)))
|
||||
return
|
||||
if not video_info.get("data", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data or data is empty".format(video_info_url))
|
||||
return
|
||||
if not video_info["data"].get("video_list", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data.video_list or data.video_list is empty".format(video_info_url))
|
||||
return
|
||||
if not video_info["data"]["video_list"].get("video_1", None):
|
||||
log.e("Get video info from {} error: The server returns JSON value"
|
||||
" without data.video_list.video_1 or data.video_list.video_1 is empty".format(video_info_url))
|
||||
return
|
||||
size = int(video_info["data"]["video_list"]["video_1"]["size"])
|
||||
print_info(site_info=site_info, title=title, type="mp4", size=size) # 该网站只有mp4类型文件
|
||||
if not info_only:
|
||||
download_urls([video_url], title, ext, size, output_dir=output_dir)
|
||||
video_url = base64.b64decode(video_info["data"]["video_list"]["video_1"]["main_url"].encode("utf-8"))
|
||||
download_urls([video_url.decode("utf-8")], title, "mp4", size, output_dir, merge=merge, headers=headers, **kwargs)
|
||||
|
||||
|
||||
def ixigua_download_playlist_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
assert "user" in url, "Only support users to publish video list,Please provide a similar url:" \
|
||||
"https://www.ixigua.com/c/user/6907091136/"
|
||||
|
||||
user_id = url.split("/")[-2] if url[-1] == "/" else url.split("/")[-1]
|
||||
params = {"max_behot_time": "0", "max_repin_time": "0", "count": "20", "page_type": "0", "user_id": user_id}
|
||||
while 1:
|
||||
url = "https://www.ixigua.com/c/user/article/?" + "&".join(["{}={}".format(k, v) for k, v in params.items()])
|
||||
video_list = loads(get_content(url, headers=headers))
|
||||
params["max_behot_time"] = video_list["next"]["max_behot_time"]
|
||||
for video in video_list["data"]:
|
||||
ixigua_download("https://www.ixigua.com/i{}/".format(video["item_id"]), output_dir, merge, info_only,
|
||||
**kwargs)
|
||||
if video_list["next"]["max_behot_time"] == 0:
|
||||
break
|
||||
|
||||
def ixigua_download_playlist(url, output_dir='.', info_only=False, **kwargs):
|
||||
"""Download all video from the user's video list
|
||||
Sample URL: https://www.ixigua.com/c/user/71141690831/
|
||||
"""
|
||||
if 'user' not in url:
|
||||
raise NotImplementedError(url)
|
||||
user_id = url.split('/')[-2]
|
||||
max_behot_time = 0
|
||||
if not user_id:
|
||||
raise NotImplementedError(url)
|
||||
base_url = "https://www.ixigua.com/c/user/article/?user_id={user_id}" \
|
||||
"&max_behot_time={max_behot_time}&max_repin_time=0&count=20&page_type=0"
|
||||
_param = {
|
||||
'user_id': user_id,
|
||||
'base_url': base_url,
|
||||
'video_list': [],
|
||||
'url': base_url.format(user_id=user_id, max_behot_time=max_behot_time),
|
||||
}
|
||||
for i in get_moment(**_param):
|
||||
ixigua_download(i, output_dir, info_only, **kwargs)
|
||||
|
||||
site_info = "ixigua.com"
|
||||
download = ixigua_download
|
||||
download_playlist = ixigua_download_playlist
|
||||
download_playlist = ixigua_download_playlist_by_url
|
||||
|
@ -16,11 +16,14 @@ def kuaishou_download_by_url(url, info_only=False, **kwargs):
|
||||
# size = video_list[-1]['size']
|
||||
# result wrong size
|
||||
try:
|
||||
og_video_url = re.search(r"<meta\s+property=\"og:video:url\"\s+content=\"(.+?)\"/>", page).group(1)
|
||||
video_url = og_video_url
|
||||
title = url.split('/')[-1]
|
||||
search_result=re.search(r"\"playUrls\":\[(\{\"quality\"\:\"\w+\",\"url\":\".*?\"\})+\]", page)
|
||||
all_video_info_str = search_result.group(1)
|
||||
all_video_infos=re.findall(r"\{\"quality\"\:\"(\w+)\",\"url\":\"(.*?)\"\}", all_video_info_str)
|
||||
# get the one of the best quality
|
||||
video_url = all_video_infos[0][1].encode("utf-8").decode('unicode-escape')
|
||||
title = re.search(r"<meta charset=UTF-8><title>(.*?)</title>", page).group(1)
|
||||
size = url_size(video_url)
|
||||
video_format = video_url.split('.')[-1]
|
||||
video_format = "flv"#video_url.split('.')[-1]
|
||||
print_info(site_info, title, video_format, size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, video_format, size, **kwargs)
|
||||
|
@ -8,46 +8,88 @@ from base64 import b64decode
|
||||
import re
|
||||
import hashlib
|
||||
|
||||
|
||||
def kugou_download(url, output_dir=".", merge=True, info_only=False, **kwargs):
|
||||
if url.lower().find("5sing")!=-1:
|
||||
#for 5sing.kugou.com
|
||||
html=get_html(url)
|
||||
ticket=r1(r'"ticket":\s*"(.*)"',html)
|
||||
j=loads(str(b64decode(ticket),encoding="utf-8"))
|
||||
url=j['file']
|
||||
title=j['songName']
|
||||
if url.lower().find("5sing") != -1:
|
||||
# for 5sing.kugou.com
|
||||
html = get_html(url)
|
||||
ticket = r1(r'"ticket":\s*"(.*)"', html)
|
||||
j = loads(str(b64decode(ticket), encoding="utf-8"))
|
||||
url = j['file']
|
||||
title = j['songName']
|
||||
songtype, ext, size = url_info(url)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
elif url.lower().find("hash") != -1:
|
||||
return kugou_download_by_hash(url, output_dir, merge, info_only)
|
||||
else:
|
||||
#for the www.kugou.com/
|
||||
# for the www.kugou.com/
|
||||
return kugou_download_playlist(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
# raise NotImplementedError(url)
|
||||
|
||||
def kugou_download_by_hash(title,hash_val,output_dir = '.', merge = True, info_only = False):
|
||||
#sample
|
||||
#url_sample:http://www.kugou.com/yy/album/single/536957.html
|
||||
#hash ->key md5(hash+kgcloud")->key decompile swf
|
||||
#cmd 4 for mp3 cmd 3 for m4a
|
||||
key=hashlib.new('md5',(hash_val+"kgcloud").encode("utf-8")).hexdigest()
|
||||
html=get_html("http://trackercdn.kugou.com/i/?pid=6&key=%s&acceptMp3=1&cmd=4&hash=%s"%(key,hash_val))
|
||||
j=loads(html)
|
||||
url=j['url']
|
||||
|
||||
def kugou_download_by_hash(url, output_dir='.', merge=True, info_only=False):
|
||||
# sample
|
||||
# url_sample:http://www.kugou.com/song/#hash=93F7D2FC6E95424739448218B591AEAF&album_id=9019462
|
||||
hash_val = match1(url, 'hash=(\w+)')
|
||||
album_id = match1(url, 'album_id=(\d+)')
|
||||
if not album_id:
|
||||
album_id = 123
|
||||
html = get_html("http://www.kugou.com/yy/index.php?r=play/getdata&hash={}&album_id={}&mid=123".format(hash_val, album_id))
|
||||
j = loads(html)
|
||||
url = j['data']['play_url']
|
||||
title = j['data']['audio_name']
|
||||
# some songs cann't play because of copyright protection
|
||||
if (url == ''):
|
||||
return
|
||||
songtype, ext, size = url_info(url)
|
||||
print_info(site_info, title, songtype, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
def kugou_download_playlist(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
html=get_html(url)
|
||||
pattern=re.compile('title="(.*?)".* data="(\w*)\|.*?"')
|
||||
pairs=pattern.findall(html)
|
||||
for title,hash_val in pairs:
|
||||
kugou_download_by_hash(title,hash_val,output_dir,merge,info_only)
|
||||
|
||||
def kugou_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
urls = []
|
||||
|
||||
# download music leaderboard
|
||||
# sample: http://www.kugou.com/yy/html/rank.html
|
||||
if url.lower().find('rank') != -1:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('<a href="(http://.*?)" data-active=')
|
||||
res = pattern.findall(html)
|
||||
for song in res:
|
||||
res = get_html(song)
|
||||
pattern_url = re.compile('"hash":"(\w+)".*"album_id":(\d)+')
|
||||
hash_val, album_id = res = pattern_url.findall(res)[0]
|
||||
if not album_id:
|
||||
album_id = 123
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (hash_val, album_id))
|
||||
|
||||
# download album
|
||||
# album sample: http://www.kugou.com/yy/album/single/1645030.html
|
||||
elif url.lower().find('album') != -1:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('var data=(\[.*?\]);')
|
||||
res = pattern.findall(html)[0]
|
||||
for v in json.loads(res):
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v['hash'], v['album_id']))
|
||||
|
||||
# download the playlist
|
||||
# playlist sample:http://www.kugou.com/yy/special/single/487279.html
|
||||
else:
|
||||
html = get_html(url)
|
||||
pattern = re.compile('data="(\w+)\|(\d+)"')
|
||||
for v in pattern.findall(html):
|
||||
urls.append('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
|
||||
print('http://www.kugou.com/song/#hash=%s&album_id=%s' % (v[0], v[1]))
|
||||
|
||||
# download the list by hash
|
||||
for url in urls:
|
||||
kugou_download_by_hash(url, output_dir, merge, info_only)
|
||||
|
||||
|
||||
site_info = "kugou.com"
|
||||
download = kugou_download
|
||||
# download_playlist = playlist_not_supported("kugou")
|
||||
download_playlist=kugou_download_playlist
|
||||
download_playlist = kugou_download_playlist
|
||||
|
@ -2,20 +2,23 @@
|
||||
|
||||
__all__ = ['letv_download', 'letvcloud_download', 'letvcloud_download_by_vu']
|
||||
|
||||
import json
|
||||
import base64
|
||||
import hashlib
|
||||
import random
|
||||
import xml.etree.ElementTree as ET
|
||||
import base64, hashlib, urllib, time, re
|
||||
import urllib
|
||||
|
||||
from ..common import *
|
||||
|
||||
#@DEPRECATED
|
||||
|
||||
# @DEPRECATED
|
||||
def get_timestamp():
|
||||
tn = random.random()
|
||||
url = 'http://api.letv.com/time?tn={}'.format(tn)
|
||||
result = get_content(url)
|
||||
return json.loads(result)['stime']
|
||||
#@DEPRECATED
|
||||
|
||||
|
||||
# @DEPRECATED
|
||||
def get_key(t):
|
||||
for s in range(0, 8):
|
||||
e = 1 & t
|
||||
@ -24,42 +27,40 @@ def get_key(t):
|
||||
t += e
|
||||
return t ^ 185025305
|
||||
|
||||
|
||||
def calcTimeKey(t):
|
||||
ror = lambda val, r_bits, : ((val & (2**32-1)) >> r_bits%32) | (val << (32-(r_bits%32)) & (2**32-1))
|
||||
ror = lambda val, r_bits,: ((val & (2 ** 32 - 1)) >> r_bits % 32) | (val << (32 - (r_bits % 32)) & (2 ** 32 - 1))
|
||||
magic = 185025305
|
||||
return ror(t, magic % 17) ^ magic
|
||||
#return ror(ror(t,773625421%13)^773625421,773625421%17)
|
||||
# return ror(ror(t,773625421%13)^773625421,773625421%17)
|
||||
|
||||
|
||||
def decode(data):
|
||||
version = data[0:5]
|
||||
if version.lower() == b'vc_01':
|
||||
#get real m3u8
|
||||
# get real m3u8
|
||||
loc2 = data[5:]
|
||||
length = len(loc2)
|
||||
loc4 = [0]*(2*length)
|
||||
loc4 = [0] * (2 * length)
|
||||
for i in range(length):
|
||||
loc4[2*i] = loc2[i] >> 4
|
||||
loc4[2*i+1]= loc2[i] & 15;
|
||||
loc6 = loc4[len(loc4)-11:]+loc4[:len(loc4)-11]
|
||||
loc7 = [0]*length
|
||||
loc4[2 * i] = loc2[i] >> 4
|
||||
loc4[2 * i + 1] = loc2[i] & 15;
|
||||
loc6 = loc4[len(loc4) - 11:] + loc4[:len(loc4) - 11]
|
||||
loc7 = [0] * length
|
||||
for i in range(length):
|
||||
loc7[i] = (loc6[2 * i] << 4) +loc6[2*i+1]
|
||||
loc7[i] = (loc6[2 * i] << 4) + loc6[2 * i + 1]
|
||||
return ''.join([chr(i) for i in loc7])
|
||||
else:
|
||||
# directly return
|
||||
return data
|
||||
return str(data)
|
||||
|
||||
|
||||
|
||||
|
||||
def video_info(vid,**kwargs):
|
||||
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=101&format=1&tkey={}&domain=www.le.com®ion=cn&source=1000&accesyx=1'.format(vid,calcTimeKey(int(time.time())))
|
||||
def video_info(vid, **kwargs):
|
||||
url = 'http://player-pc.le.com/mms/out/video/playJson?id={}&platid=1&splatid=105&format=1&tkey={}&domain=www.le.com®ion=cn&source=1000&accesyx=1'.format(vid, calcTimeKey(int(time.time())))
|
||||
r = get_content(url, decoded=False)
|
||||
info=json.loads(str(r,"utf-8"))
|
||||
info = json.loads(str(r, "utf-8"))
|
||||
info = info['msgs']
|
||||
|
||||
|
||||
stream_id = None
|
||||
support_stream_id = info["playurl"]["dispatch"].keys()
|
||||
if "stream_id" in kwargs and kwargs["stream_id"].lower() in support_stream_id:
|
||||
@ -70,27 +71,28 @@ def video_info(vid,**kwargs):
|
||||
elif "720p" in support_stream_id:
|
||||
stream_id = '720p'
|
||||
else:
|
||||
stream_id =sorted(support_stream_id,key= lambda i: int(i[1:]))[-1]
|
||||
stream_id = sorted(support_stream_id, key=lambda i: int(i[1:]))[-1]
|
||||
|
||||
url =info["playurl"]["domain"][0]+info["playurl"]["dispatch"][stream_id][0]
|
||||
url = info["playurl"]["domain"][0] + info["playurl"]["dispatch"][stream_id][0]
|
||||
uuid = hashlib.sha1(url.encode('utf8')).hexdigest() + '_0'
|
||||
ext = info["playurl"]["dispatch"][stream_id][1].split('.')[-1]
|
||||
url = url.replace('tss=0', 'tss=ios')
|
||||
url+="&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
|
||||
url += "&m3v=1&termid=1&format=1&hwtype=un&ostype=MacOS10.12.4&p1=1&p2=10&p3=-&expect=3&tn={}&vid={}&uuid={}&sign=letv".format(random.random(), vid, uuid)
|
||||
|
||||
r2=get_content(url,decoded=False)
|
||||
info2=json.loads(str(r2,"utf-8"))
|
||||
r2 = get_content(url, decoded=False)
|
||||
info2 = json.loads(str(r2, "utf-8"))
|
||||
|
||||
# hold on ! more things to do
|
||||
# to decode m3u8 (encoded)
|
||||
suffix = '&r=' + str(int(time.time() * 1000)) + '&appid=500'
|
||||
m3u8 = get_content(info2["location"]+suffix,decoded=False)
|
||||
m3u8 = get_content(info2["location"] + suffix, decoded=False)
|
||||
m3u8_list = decode(m3u8)
|
||||
urls = re.findall(r'^[^#][^\r]*',m3u8_list,re.MULTILINE)
|
||||
return ext,urls
|
||||
urls = re.findall(r'(http.*?)#', m3u8_list, re.MULTILINE)
|
||||
return ext, urls
|
||||
|
||||
def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,**kwargs):
|
||||
ext , urls = video_info(vid,**kwargs)
|
||||
|
||||
def letv_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ext, urls = video_info(vid, **kwargs)
|
||||
size = 0
|
||||
for i in urls:
|
||||
_, _, tmp = url_info(i)
|
||||
@ -100,27 +102,29 @@ def letv_download_by_vid(vid,title, output_dir='.', merge=True, info_only=False,
|
||||
if not info_only:
|
||||
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def letvcloud_download_by_vu(vu, uu, title=None, output_dir='.', merge=True, info_only=False):
|
||||
#ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
|
||||
#str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
|
||||
argumet_dict ={'cf' : 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu),'ver': '2.2', 'vu': str(vu), }
|
||||
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' #ALL YOUR BASE ARE BELONG TO US
|
||||
# ran = float('0.' + str(random.randint(0, 9999999999999999))) # For ver 2.1
|
||||
# str2Hash = 'cfflashformatjsonran{ran}uu{uu}ver2.2vu{vu}bie^#@(%27eib58'.format(vu = vu, uu = uu, ran = ran) #Magic!/ In ver 2.1
|
||||
argumet_dict = {'cf': 'flash', 'format': 'json', 'ran': str(int(time.time())), 'uu': str(uu), 'ver': '2.2', 'vu': str(vu), }
|
||||
sign_key = '2f9d6924b33a165a6d8b5d3d42f4f987' # ALL YOUR BASE ARE BELONG TO US
|
||||
str2Hash = ''.join([i + argumet_dict[i] for i in sorted(argumet_dict)]) + sign_key
|
||||
sign = hashlib.md5(str2Hash.encode('utf-8')).hexdigest()
|
||||
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign = sign))
|
||||
request_info = urllib.request.Request('http://api.letvcloud.com/gpc.php?' + '&'.join([i + '=' + argumet_dict[i] for i in argumet_dict]) + '&sign={sign}'.format(sign=sign))
|
||||
response = urllib.request.urlopen(request_info)
|
||||
data = response.read()
|
||||
info = json.loads(data.decode('utf-8'))
|
||||
type_available = []
|
||||
for video_type in info['data']['video_info']['media']:
|
||||
type_available.append({'video_url': info['data']['video_info']['media'][video_type]['play_url']['main_url'], 'video_quality': int(info['data']['video_info']['media'][video_type]['play_url']['vtype'])})
|
||||
urls = [base64.b64decode(sorted(type_available, key = lambda x:x['video_quality'])[-1]['video_url']).decode("utf-8")]
|
||||
urls = [base64.b64decode(sorted(type_available, key=lambda x: x['video_quality'])[-1]['video_url']).decode("utf-8")]
|
||||
size = urls_size(urls)
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, ext, size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
|
||||
qs = parse.urlparse(url).query
|
||||
vu = match1(qs, r'vu=([\w]+)')
|
||||
@ -128,7 +132,8 @@ def letvcloud_download(url, output_dir='.', merge=True, info_only=False):
|
||||
title = "LETV-%s" % vu
|
||||
letvcloud_download_by_vu(vu, uu, title=title, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
|
||||
def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
|
||||
|
||||
def letv_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
url = url_locations([url])[0]
|
||||
if re.match(r'http://yuntv.letv.com/', url):
|
||||
letvcloud_download(url, output_dir=output_dir, merge=merge, info_only=info_only)
|
||||
@ -136,14 +141,15 @@ def letv_download(url, output_dir='.', merge=True, info_only=False ,**kwargs):
|
||||
html = get_content(url)
|
||||
vid = match1(url, r'video/(\d+)\.html')
|
||||
title = match1(html, r'<h2 class="title">([^<]+)</h2>')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only,**kwargs)
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
else:
|
||||
html = get_content(url)
|
||||
vid = match1(url, r'http://www.letv.com/ptv/vplay/(\d+).html') or \
|
||||
match1(url, r'http://www.le.com/ptv/vplay/(\d+).html') or \
|
||||
match1(html, r'vid="(\d+)"')
|
||||
title = match1(html,r'name="irTitle" content="(.*?)"')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only,**kwargs)
|
||||
match1(url, r'http://www.le.com/ptv/vplay/(\d+).html') or \
|
||||
match1(html, r'vid="(\d+)"')
|
||||
title = match1(html, r'name="irTitle" content="(.*?)"')
|
||||
letv_download_by_vid(vid, title=title, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "Le.com"
|
||||
download = letv_download
|
||||
|
@ -2,8 +2,17 @@
|
||||
|
||||
__all__ = ['lizhi_download']
|
||||
import json
|
||||
import datetime
|
||||
from ..common import *
|
||||
|
||||
#
|
||||
# Worked well but not perfect.
|
||||
# TODO: add option --format={sd|hd}
|
||||
#
|
||||
def get_url(ep):
|
||||
readable = datetime.datetime.fromtimestamp(int(ep['create_time']) / 1000).strftime('%Y/%m/%d')
|
||||
return 'http://cdn5.lizhi.fm/audio/{}/{}_hd.mp3'.format(readable, ep['id'])
|
||||
|
||||
# radio_id: e.g. 549759 from http://www.lizhi.fm/549759/
|
||||
#
|
||||
# Returns a list of tuples (audio_id, title, url) for each episode
|
||||
@ -23,7 +32,7 @@ def lizhi_extract_playlist_info(radio_id):
|
||||
# (au_cnt), then handle pagination properly.
|
||||
api_url = 'http://www.lizhi.fm/api/radio_audios?s=0&l=65535&band=%s' % radio_id
|
||||
api_response = json.loads(get_content(api_url))
|
||||
return [(ep['id'], ep['name'], ep['url']) for ep in api_response]
|
||||
return [(ep['id'], ep['name'], get_url(ep)) for ep in api_response]
|
||||
|
||||
def lizhi_download_audio(audio_id, title, url, output_dir='.', info_only=False):
|
||||
filetype, ext, size = url_info(url)
|
||||
|
74
src/you_get/extractors/longzhu.py
Normal file
74
src/you_get/extractors/longzhu.py
Normal file
@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['longzhu_download']
|
||||
|
||||
import json
|
||||
from ..common import (
|
||||
get_content,
|
||||
general_m3u8_extractor,
|
||||
match1,
|
||||
print_info,
|
||||
download_urls,
|
||||
playlist_not_supported,
|
||||
)
|
||||
from ..common import player
|
||||
|
||||
def longzhu_download(url, output_dir = '.', merge=True, info_only=False, **kwargs):
|
||||
web_domain = url.split('/')[2]
|
||||
if (web_domain == 'star.longzhu.com') or (web_domain == 'y.longzhu.com'):
|
||||
domain = url.split('/')[3].split('?')[0]
|
||||
m_url = 'http://m.longzhu.com/{0}'.format(domain)
|
||||
m_html = get_content(m_url)
|
||||
room_id_patt = r'var\s*roomId\s*=\s*(\d+);'
|
||||
room_id = match1(m_html,room_id_patt)
|
||||
|
||||
json_url = 'http://liveapi.plu.cn/liveapp/roomstatus?roomId={0}'.format(room_id)
|
||||
content = get_content(json_url)
|
||||
data = json.loads(content)
|
||||
streamUri = data['streamUri']
|
||||
if len(streamUri) <= 4:
|
||||
raise ValueError('The live stream is not online!')
|
||||
title = data['title']
|
||||
streamer = data['userName']
|
||||
title = str.format(streamer,': ',title)
|
||||
|
||||
steam_api_url = 'http://livestream.plu.cn/live/getlivePlayurl?roomId={0}'.format(room_id)
|
||||
content = get_content(steam_api_url)
|
||||
data = json.loads(content)
|
||||
isonline = data.get('isTransfer')
|
||||
if isonline == '0':
|
||||
raise ValueError('The live stream is not online!')
|
||||
|
||||
real_url = data['playLines'][0]['urls'][0]['securityUrl']
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
|
||||
if not info_only:
|
||||
download_urls([real_url], title, 'flv', None, output_dir, merge=merge)
|
||||
|
||||
elif web_domain == 'replay.longzhu.com':
|
||||
videoid = match1(url, r'(\d+)$')
|
||||
json_url = 'http://liveapi.longzhu.com/livereplay/getreplayfordisplay?videoId={0}'.format(videoid)
|
||||
content = get_content(json_url)
|
||||
data = json.loads(content)
|
||||
|
||||
username = data['userName']
|
||||
title = data['title']
|
||||
title = str.format(username,':',title)
|
||||
real_url = data['videoUrl']
|
||||
|
||||
if player:
|
||||
print_info('Longzhu Video', title, 'm3u8', 0)
|
||||
download_urls([real_url], title, 'm3u8', 0, output_dir, merge=merge)
|
||||
else:
|
||||
urls = general_m3u8_extractor(real_url)
|
||||
print_info('Longzhu Video', title, 'm3u8', 0)
|
||||
if not info_only:
|
||||
download_urls(urls, title, 'ts', 0, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
else:
|
||||
raise ValueError('Wrong url or unsupported link ... {0}'.format(url))
|
||||
|
||||
site_info = 'longzhu.com'
|
||||
download = longzhu_download
|
||||
download_playlist = playlist_not_supported('longzhu')
|
@ -68,7 +68,7 @@ class MGTV(VideoExtractor):
|
||||
self.title = content['data']['info']['title']
|
||||
domain = content['data']['stream_domain'][0]
|
||||
|
||||
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||
#stream_available = [i['name'] for i in content['data']['stream']]
|
||||
stream_available = {}
|
||||
for i in content['data']['stream']:
|
||||
stream_available[i['name']] = i['url']
|
||||
|
@ -2,9 +2,12 @@
|
||||
|
||||
__all__ = ['miaopai_download']
|
||||
|
||||
import string
|
||||
import random
|
||||
from ..common import *
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
from ..util import fs
|
||||
|
||||
fake_headers_mobile = {
|
||||
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
||||
@ -20,6 +23,10 @@ def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = Fa
|
||||
|
||||
mobile_page = get_content(page_url, headers=fake_headers_mobile)
|
||||
url = match1(mobile_page, r'<video id=.*?src=[\'"](.*?)[\'"]\W')
|
||||
if url is None:
|
||||
wb_mp = re.search(r'<script src=([\'"])(.+?wb_mp\.js)\1>', mobile_page).group(2)
|
||||
return miaopai_download_by_wbmp(wb_mp, fid, output_dir=output_dir, merge=merge,
|
||||
info_only=info_only, total_size=None, **kwargs)
|
||||
title = match1(mobile_page, r'<title>((.|\n)+?)</title>')
|
||||
if not title:
|
||||
title = fid
|
||||
@ -29,9 +36,79 @@ def miaopai_download_by_fid(fid, output_dir = '.', merge = False, info_only = Fa
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, total_size=None, output_dir=output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **kwargs):
|
||||
fid = match1(url, r'\?fid=(\d{4}:\w{32})')
|
||||
|
||||
def miaopai_download_by_wbmp(wbmp_url, fid, info_only=False, **kwargs):
|
||||
headers = {}
|
||||
headers.update(fake_headers_mobile)
|
||||
headers['Host'] = 'imgaliyuncdn.miaopai.com'
|
||||
wbmp = get_content(wbmp_url, headers=headers)
|
||||
appid = re.search(r'appid:\s*?([^,]+?),', wbmp).group(1)
|
||||
jsonp = re.search(r'jsonp:\s*?([\'"])(\w+?)\1', wbmp).group(2)
|
||||
population = [i for i in string.ascii_lowercase] + [i for i in string.digits]
|
||||
info_url = '{}?{}'.format('http://p.weibo.com/aj_media/info', parse.urlencode({
|
||||
'appid': appid.strip(),
|
||||
'fid': fid,
|
||||
jsonp.strip(): '_jsonp' + ''.join(random.sample(population, 11))
|
||||
}))
|
||||
headers['Host'] = 'p.weibo.com'
|
||||
jsonp_text = get_content(info_url, headers=headers)
|
||||
jsonp_dict = json.loads(match1(jsonp_text, r'\(({.+})\)'))
|
||||
if jsonp_dict['code'] != 200:
|
||||
log.wtf('[Failed] "%s"' % jsonp_dict['msg'])
|
||||
video_url = jsonp_dict['data']['meta_data'][0]['play_urls']['l']
|
||||
title = jsonp_dict['data']['description']
|
||||
title = title.replace('\n', '_')
|
||||
ext = 'mp4'
|
||||
headers['Host'] = 'f.us.sinaimg.cn'
|
||||
print_info(site_info, title, ext, url_info(video_url, headers=headers)[2])
|
||||
if not info_only:
|
||||
download_urls([video_url], fs.legitimize(title), ext, headers=headers, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download_story(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
data_url = 'https://m.weibo.cn/s/video/object?%s' % url.split('?')[1]
|
||||
data_content = get_content(data_url, headers=fake_headers_mobile)
|
||||
data = json.loads(data_content)
|
||||
title = data['data']['object']['summary']
|
||||
stream_url = data['data']['object']['stream']['url']
|
||||
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
|
||||
if not info_only:
|
||||
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download_direct(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
mobile_page = get_content(url, headers=fake_headers_mobile)
|
||||
try:
|
||||
title = re.search(r'([\'"])title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
except:
|
||||
title = re.search(r'([\'"])status_title\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
title = title.replace('\n', '_')
|
||||
try:
|
||||
stream_url = re.search(r'([\'"])stream_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
except:
|
||||
page_url = re.search(r'([\'"])page_url\1:\s*([\'"])(.+?)\2,', mobile_page).group(3)
|
||||
return miaopai_download_story(page_url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
ext = 'mp4'
|
||||
print_info(site_info, title, ext, url_info(stream_url, headers=fake_headers_mobile)[2])
|
||||
if not info_only:
|
||||
download_urls([stream_url], fs.legitimize(title), ext, total_size=None, headers=fake_headers_mobile, **kwargs)
|
||||
|
||||
|
||||
def miaopai_download(url, output_dir='.', merge=False, info_only=False, **kwargs):
|
||||
if re.match(r'^http[s]://.*\.weibo\.com/\d+/.+', url):
|
||||
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
if re.match(r'^http[s]://.*\.weibo\.(com|cn)/s/video/.+', url):
|
||||
return miaopai_download_story(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
# FIXME!
|
||||
if re.match(r'^http[s]://.*\.weibo\.com/tv/v/(\w+)', url):
|
||||
return miaopai_download_direct(url, info_only=info_only, output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
fid = match1(url, r'\?fid=(\d{4}:\w+)')
|
||||
if fid is not None:
|
||||
miaopai_download_by_fid(fid, output_dir, merge, info_only)
|
||||
elif '/p/230444' in url:
|
||||
@ -46,6 +123,7 @@ def miaopai_download(url, output_dir = '.', merge = False, info_only = False, **
|
||||
escaped_url = hit.group(1)
|
||||
miaopai_download(urllib.parse.unquote(escaped_url), output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
|
||||
site_info = "miaopai"
|
||||
download = miaopai_download
|
||||
download_playlist = playlist_not_supported('miaopai')
|
||||
|
@ -7,31 +7,40 @@ import re
|
||||
|
||||
from ..util import log
|
||||
from ..common import get_content, download_urls, print_info, playlist_not_supported, url_size
|
||||
from .universal import *
|
||||
|
||||
__all__ = ['naver_download_by_url']
|
||||
|
||||
|
||||
def naver_download_by_url(url, info_only=False, **kwargs):
|
||||
def naver_download_by_url(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
ep = 'https://apis.naver.com/rmcnmv/rmcnmv/vod/play/v2.0/{}?key={}'
|
||||
page = get_content(url)
|
||||
og_video_url = re.search(r"<meta\s+property=\"og:video:url\"\s+content='(.+?)'>", page).group(1)
|
||||
params_dict = urllib.parse.parse_qs(urllib.parse.urlparse(og_video_url).query)
|
||||
vid = params_dict['vid'][0]
|
||||
key = params_dict['outKey'][0]
|
||||
meta_str = get_content(ep.format(vid, key))
|
||||
meta_json = json.loads(meta_str)
|
||||
if 'errorCode' in meta_json:
|
||||
log.wtf(meta_json['errorCode'])
|
||||
title = meta_json['meta']['subject']
|
||||
videos = meta_json['videos']['list']
|
||||
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
|
||||
video_url = video_list[-1]['source']
|
||||
# size = video_list[-1]['size']
|
||||
# result wrong size
|
||||
size = url_size(video_url)
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, 'mp4', size, **kwargs)
|
||||
try:
|
||||
temp = re.search(r"<meta\s+property=\"og:video:url\"\s+content='(.+?)'>", page)
|
||||
if temp is not None:
|
||||
og_video_url = temp.group(1)
|
||||
params_dict = urllib.parse.parse_qs(urllib.parse.urlparse(og_video_url).query)
|
||||
vid = params_dict['vid'][0]
|
||||
key = params_dict['outKey'][0]
|
||||
else:
|
||||
vid = re.search(r"\"videoId\"\s*:\s*\"(.+?)\"", page).group(1)
|
||||
key = re.search(r"\"inKey\"\s*:\s*\"(.+?)\"", page).group(1)
|
||||
meta_str = get_content(ep.format(vid, key))
|
||||
meta_json = json.loads(meta_str)
|
||||
if 'errorCode' in meta_json:
|
||||
log.wtf(meta_json['errorCode'])
|
||||
title = meta_json['meta']['subject']
|
||||
videos = meta_json['videos']['list']
|
||||
video_list = sorted(videos, key=lambda video: video['encodingOption']['width'])
|
||||
video_url = video_list[-1]['source']
|
||||
# size = video_list[-1]['size']
|
||||
# result wrong size
|
||||
size = url_size(video_url)
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, 'mp4', size, **kwargs)
|
||||
except:
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
|
||||
site_info = "naver.com"
|
||||
download = naver_download_by_url
|
||||
|
@ -1,43 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['panda_download']
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
import json
|
||||
import time
|
||||
|
||||
def panda_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
roomid = re.search('/(\d+)', url)
|
||||
if roomid is None:
|
||||
log.wtf('Cannot found room id for this url')
|
||||
roomid = roomid.group(1)
|
||||
json_request_url ="http://www.panda.tv/api_room_v2?roomid={}&__plat=pc_web&_={}".format(roomid, int(time.time()))
|
||||
content = get_html(json_request_url)
|
||||
api_json = json.loads(content)
|
||||
|
||||
errno = api_json["errno"]
|
||||
errmsg = api_json["errmsg"]
|
||||
if errno:
|
||||
raise ValueError("Errno : {}, Errmsg : {}".format(errno, errmsg))
|
||||
data = api_json["data"]
|
||||
title = data["roominfo"]["name"]
|
||||
room_key = data["videoinfo"]["room_key"]
|
||||
plflag = data["videoinfo"]["plflag"].split("_")
|
||||
status = data["videoinfo"]["status"]
|
||||
if status is not "2":
|
||||
raise ValueError("The live stream is not online! (status:%s)" % status)
|
||||
|
||||
data2 = json.loads(data["videoinfo"]["plflag_list"])
|
||||
rid = data2["auth"]["rid"]
|
||||
sign = data2["auth"]["sign"]
|
||||
ts = data2["auth"]["time"]
|
||||
real_url = "http://pl{}.live.panda.tv/live_panda/{}.flv?sign={}&ts={}&rid={}".format(plflag[1], room_key, sign, ts, rid)
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
||||
|
||||
site_info = "panda.tv"
|
||||
download = panda_download
|
||||
download_playlist = playlist_not_supported('panda')
|
@ -190,16 +190,16 @@ class PPTV(VideoExtractor):
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
if self.url and not self.vid:
|
||||
if not re.match(r'http://v.pptv.com/show/(\w+)\.html', self.url):
|
||||
if not re.match(r'https?://v.pptv.com/show/(\w+)\.html', self.url):
|
||||
raise('Unknown url pattern')
|
||||
page_content = get_content(self.url)
|
||||
page_content = get_content(self.url,{"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"})
|
||||
self.vid = match1(page_content, r'webcfg\s*=\s*{"id":\s*(\d+)')
|
||||
|
||||
if not self.vid:
|
||||
raise('Cannot find id')
|
||||
api_url = 'http://web-play.pptv.com/webplay3-0-{}.xml'.format(self.vid)
|
||||
api_url += '?appplt=flp&appid=pptv.flashplayer.vod&appver=3.4.2.28&type=&version=4'
|
||||
dom = parseString(get_content(api_url))
|
||||
dom = parseString(get_content(api_url,{"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"}))
|
||||
self.title, m_items, m_streams, m_segs = parse_pptv_xml(dom)
|
||||
xml_streams = merge_meta(m_items, m_streams, m_segs)
|
||||
for stream_id in xml_streams:
|
||||
|
@ -58,7 +58,7 @@ class QiE(VideoExtractor):
|
||||
content = loads(content)
|
||||
self.title = content['data']['room_name']
|
||||
rtmp_url = content['data']['rtmp_url']
|
||||
#stream_avalable = [i['name'] for i in content['data']['stream']]
|
||||
#stream_available = [i['name'] for i in content['data']['stream']]
|
||||
stream_available = {}
|
||||
stream_available['normal'] = rtmp_url + '/' + content['data']['rtmp_live']
|
||||
if len(content['data']['rtmp_multi_bitrate']) > 0:
|
||||
|
@ -2,36 +2,44 @@
|
||||
|
||||
__all__ = ['qq_download']
|
||||
|
||||
from ..common import *
|
||||
from ..util.log import *
|
||||
from .qie import download as qieDownload
|
||||
from .qie_video import download_by_url as qie_video_download
|
||||
from urllib.parse import urlparse,parse_qs
|
||||
from ..common import *
|
||||
|
||||
|
||||
def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
|
||||
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform=11&defnpayver=1&vid={}'.format(vid)
|
||||
info = get_content(info_api)
|
||||
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
|
||||
|
||||
# http://v.sports.qq.com/#/cover/t0fqsm1y83r8v5j/a0026nvw5jr https://v.qq.com/x/cover/t0fqsm1y83r8v5j/a0026nvw5jr.html
|
||||
video_json = None
|
||||
platforms = [4100201, 11]
|
||||
for platform in platforms:
|
||||
info_api = 'http://vv.video.qq.com/getinfo?otype=json&appver=3.2.19.333&platform={}&defnpayver=1&defn=shd&vid={}'.format(platform, vid)
|
||||
info = get_content(info_api)
|
||||
video_json = json.loads(match1(info, r'QZOutputJson=(.*)')[:-1])
|
||||
if not video_json.get('msg')=='cannot play outside':
|
||||
break
|
||||
fn_pre = video_json['vl']['vi'][0]['lnk']
|
||||
title = video_json['vl']['vi'][0]['ti']
|
||||
host = video_json['vl']['vi'][0]['ul']['ui'][0]['url']
|
||||
streams = video_json['fl']['fi']
|
||||
seg_cnt = video_json['vl']['vi'][0]['cl']['fc']
|
||||
seg_cnt = fc_cnt = video_json['vl']['vi'][0]['cl']['fc']
|
||||
|
||||
filename = video_json['vl']['vi'][0]['fn']
|
||||
if seg_cnt == 0:
|
||||
seg_cnt = 1
|
||||
|
||||
best_quality = streams[-1]['name']
|
||||
part_format_id = streams[-1]['id']
|
||||
else:
|
||||
fn_pre, magic_str, video_type = filename.split('.')
|
||||
|
||||
part_urls= []
|
||||
total_size = 0
|
||||
for part in range(1, seg_cnt+1):
|
||||
#if seg_cnt == 1 and video_json['vl']['vi'][0]['vh'] <= 480:
|
||||
# filename = fn_pre + '.mp4'
|
||||
#else:
|
||||
# filename = fn_pre + '.p' + str(part_format_id % 10000) + '.' + str(part) + '.mp4'
|
||||
filename = fn_pre + '.p' + str(part_format_id % 10000) + '.' + str(part) + '.mp4'
|
||||
if fc_cnt == 0:
|
||||
# fix json parsing error
|
||||
# example:https://v.qq.com/x/page/w0674l9yrrh.html
|
||||
part_format_id = video_json['vl']['vi'][0]['cl']['keyid'].split('.')[-1]
|
||||
else:
|
||||
part_format_id = video_json['vl']['vi'][0]['cl']['ci'][part - 1]['keyid'].split('.')[1]
|
||||
filename = '.'.join([fn_pre, magic_str, str(part), video_type])
|
||||
|
||||
key_api = "http://vv.video.qq.com/getkey?otype=json&platform=11&format={}&vid={}&filename={}&appver=3.2.19.333".format(part_format_id, vid, filename)
|
||||
part_info = get_content(key_api)
|
||||
key_json = json.loads(match1(part_info, r'QZOutputJson=(.*)')[:-1])
|
||||
@ -47,6 +55,9 @@ def qq_download_by_vid(vid, title, output_dir='.', merge=True, info_only=False):
|
||||
else:
|
||||
log.w(key_json['msg'])
|
||||
break
|
||||
if key_json.get('filename') is None:
|
||||
log.w(key_json['msg'])
|
||||
break
|
||||
|
||||
part_urls.append(url)
|
||||
_, ext, size = url_info(url)
|
||||
@ -96,6 +107,7 @@ def kg_qq_download_by_shareid(shareid, output_dir='.', info_only=False, caption=
|
||||
|
||||
def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
""""""
|
||||
|
||||
if re.match(r'https?://egame.qq.com/live\?anchorid=(\d+)', url):
|
||||
from . import qq_egame
|
||||
qq_egame.qq_egame_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
@ -121,19 +133,15 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
qq_download_by_vid(vid, vid, output_dir, merge, info_only)
|
||||
return
|
||||
|
||||
#do redirect
|
||||
if 'v.qq.com/page' in url:
|
||||
# for URLs like this:
|
||||
# http://v.qq.com/page/k/9/7/k0194pwgw97.html
|
||||
new_url = url_locations([url])[0]
|
||||
if url == new_url:
|
||||
#redirect in js?
|
||||
content = get_content(url)
|
||||
url = match1(content,r'window\.location\.href="(.*?)"')
|
||||
else:
|
||||
url = new_url
|
||||
|
||||
if 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
|
||||
if 'kuaibao.qq.com/s/' in url:
|
||||
# https://kuaibao.qq.com/s/20180521V0Z9MH00
|
||||
nid = match1(url, r'/s/([^/&?#]+)')
|
||||
content = get_content('https://kuaibao.qq.com/getVideoRelate?id=' + nid)
|
||||
info_json = json.loads(content)
|
||||
vid=info_json['videoinfo']['vid']
|
||||
title=info_json['videoinfo']['title']
|
||||
elif 'kuaibao.qq.com' in url or re.match(r'http://daxue.qq.com/content/content/id/\d+', url):
|
||||
# http://daxue.qq.com/content/content/id/2321
|
||||
content = get_content(url)
|
||||
vid = match1(content, r'vid\s*=\s*"\s*([^"]+)"')
|
||||
title = match1(content, r'title">([^"]+)</p>')
|
||||
@ -142,6 +150,11 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
vid = match1(url, r'\bvid=(\w+)')
|
||||
# for embedded URLs; don't know what the title is
|
||||
title = vid
|
||||
elif 'view.inews.qq.com' in url:
|
||||
# view.inews.qq.com/a/20180521V0Z9MH00
|
||||
content = get_content(url)
|
||||
vid = match1(content, r'"vid":"(\w+)"')
|
||||
title = match1(content, r'"title":"(\w+)"')
|
||||
else:
|
||||
content = get_content(url)
|
||||
#vid = parse_qs(urlparse(url).query).get('vid') #for links specified vid like http://v.qq.com/cover/p/ps6mnfqyrfo7es3.html?vid=q0181hpdvo5
|
||||
@ -149,6 +162,9 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
vid = ""
|
||||
if rurl:
|
||||
vid = rurl.split('/')[-1].split('.')[0]
|
||||
# https://v.qq.com/x/page/d0552xbadkl.html https://y.qq.com/n/yqq/mv/v/g00268vlkzy.html
|
||||
if vid == "undefined" or vid == "index":
|
||||
vid = ""
|
||||
vid = vid if vid else url.split('/')[-1].split('.')[0] #https://v.qq.com/x/cover/ps6mnfqyrfo7es3/q0181hpdvo5.html?
|
||||
vid = vid if vid else match1(content, r'vid"*\s*:\s*"\s*([^"]+)"') #general fallback
|
||||
if not vid:
|
||||
@ -158,6 +174,7 @@ def qq_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
title = match1(content, r'"title":"([^"]+)"') if not title else title
|
||||
title = vid if not title else title #general fallback
|
||||
|
||||
|
||||
qq_download_by_vid(vid, title, output_dir, merge, info_only)
|
||||
|
||||
site_info = "QQ.com"
|
||||
|
@ -1,28 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['quanmin_download']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
def quanmin_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
roomid = url.split('/')[3].split('?')[0]
|
||||
|
||||
json_request_url = 'http://m.quanmin.tv/json/rooms/{}/noinfo6.json'.format(roomid)
|
||||
content = get_html(json_request_url)
|
||||
data = json.loads(content)
|
||||
|
||||
title = data["title"]
|
||||
|
||||
if not data["play_status"]:
|
||||
raise ValueError("The live stream is not online!")
|
||||
|
||||
real_url = data["live"]["ws"]["flv"]["5"]["src"]
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_urls([real_url], title, 'flv', None, output_dir, merge = merge)
|
||||
|
||||
site_info = "quanmin.tv"
|
||||
download = quanmin_download
|
||||
download_playlist = playlist_not_supported('quanmin')
|
@ -15,11 +15,13 @@ Changelog:
|
||||
new api
|
||||
'''
|
||||
|
||||
def real_url(host,vid,tvid,new,clipURL,ck):
|
||||
url = 'http://'+host+'/?prot=9&prod=flash&pt=1&file='+clipURL+'&new='+new +'&key='+ ck+'&vid='+str(vid)+'&uid='+str(int(time.time()*1000))+'&t='+str(random())+'&rb=1'
|
||||
return json.loads(get_html(url))['url']
|
||||
|
||||
def sohu_download(url, output_dir = '.', merge = True, info_only = False, extractor_proxy=None, **kwargs):
|
||||
def real_url(fileName, key, ch):
|
||||
url = "https://data.vod.itc.cn/ip?new=" + fileName + "&num=1&key=" + key + "&ch=" + ch + "&pt=1&pg=2&prod=h5n"
|
||||
return json.loads(get_html(url))['servers'][0]['url']
|
||||
|
||||
|
||||
def sohu_download(url, output_dir='.', merge=True, info_only=False, extractor_proxy=None, **kwargs):
|
||||
if re.match(r'http://share.vrs.sohu.com', url):
|
||||
vid = r1('id=(\d+)', url)
|
||||
else:
|
||||
@ -27,16 +29,16 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
||||
vid = r1(r'\Wvid\s*[\:=]\s*[\'"]?(\d+)[\'"]?', html)
|
||||
assert vid
|
||||
|
||||
if re.match(r'http[s]://tv.sohu.com/', url):
|
||||
if extractor_proxy:
|
||||
set_proxy(tuple(extractor_proxy.split(":")))
|
||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
|
||||
for qtyp in ["oriVid","superVid","highVid" ,"norVid","relativeId"]:
|
||||
if extractor_proxy:
|
||||
set_proxy(tuple(extractor_proxy.split(":")))
|
||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % vid))
|
||||
if info and info.get("data", ""):
|
||||
for qtyp in ["oriVid", "superVid", "highVid", "norVid", "relativeId"]:
|
||||
if 'data' in info:
|
||||
hqvid = info['data'][qtyp]
|
||||
else:
|
||||
hqvid = info[qtyp]
|
||||
if hqvid != 0 and hqvid != vid :
|
||||
if hqvid != 0 and hqvid != vid:
|
||||
info = json.loads(get_decoded_html('http://hot.vrs.sohu.com/vrs_flash.action?vid=%s' % hqvid))
|
||||
if not 'allot' in info:
|
||||
continue
|
||||
@ -51,9 +53,8 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
||||
title = data['tvName']
|
||||
size = sum(data['clipsBytes'])
|
||||
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
|
||||
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
|
||||
clipURL = urlparse(clip).path
|
||||
urls.append(real_url(host,hqvid,tvid,new,clipURL,ck))
|
||||
for fileName, key in zip(data['su'], data['ck']):
|
||||
urls.append(real_url(fileName, key, data['ch']))
|
||||
# assert data['clipsURL'][0].endswith('.mp4')
|
||||
|
||||
else:
|
||||
@ -64,15 +65,15 @@ def sohu_download(url, output_dir = '.', merge = True, info_only = False, extrac
|
||||
urls = []
|
||||
data = info['data']
|
||||
title = data['tvName']
|
||||
size = sum(map(int,data['clipsBytes']))
|
||||
size = sum(map(int, data['clipsBytes']))
|
||||
assert len(data['clipsURL']) == len(data['clipsBytes']) == len(data['su'])
|
||||
for new,clip,ck, in zip(data['su'], data['clipsURL'], data['ck']):
|
||||
clipURL = urlparse(clip).path
|
||||
urls.append(real_url(host,vid,tvid,new,clipURL,ck))
|
||||
for fileName, key in zip(data['su'], data['ck']):
|
||||
urls.append(real_url(fileName, key, data['ch']))
|
||||
|
||||
print_info(site_info, title, 'mp4', size)
|
||||
if not info_only:
|
||||
download_urls(urls, title, 'mp4', size, output_dir, refer = url, merge = merge)
|
||||
download_urls(urls, title, 'mp4', size, output_dir, refer=url, merge=merge)
|
||||
|
||||
|
||||
site_info = "Sohu.com"
|
||||
download = sohu_download
|
||||
|
21
src/you_get/extractors/tiktok.py
Normal file
21
src/you_get/extractors/tiktok.py
Normal file
@ -0,0 +1,21 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['tiktok_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def tiktok_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url, faker=True)
|
||||
title = r1(r'<title.*?>(.*?)</title>', html)
|
||||
video_id = r1(r'/video/(\d+)', url) or r1(r'musical\?id=(\d+)', html)
|
||||
title = '%s [%s]' % (title, video_id)
|
||||
source = r1(r'<video .*?src="([^"]+)"', html)
|
||||
mime, ext, size = url_info(source)
|
||||
|
||||
print_info(site_info, title, mime, size)
|
||||
if not info_only:
|
||||
download_urls([source], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
site_info = "TikTok.com"
|
||||
download = tiktok_download
|
||||
download_playlist = playlist_not_supported('tiktok')
|
@ -1,27 +1,36 @@
|
||||
#!/usr/bin/env python
|
||||
import base64
|
||||
|
||||
import binascii
|
||||
|
||||
from ..common import *
|
||||
import random
|
||||
from json import loads
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from ..common import *
|
||||
|
||||
try:
|
||||
from base64 import decodebytes
|
||||
except ImportError:
|
||||
from base64 import decodestring
|
||||
|
||||
decodebytes = decodestring
|
||||
|
||||
__all__ = ['toutiao_download', ]
|
||||
|
||||
|
||||
def random_with_n_digits(n):
|
||||
return random.randint(10 ** (n - 1), (10 ** n) - 1)
|
||||
|
||||
|
||||
def sign_video_url(vid):
|
||||
# some code from http://codecloud.net/110854.html
|
||||
r = str(random.random())[2:]
|
||||
r = str(random_with_n_digits(16))
|
||||
|
||||
def right_shift(val, n):
|
||||
return val >> n if val >= 0 else (val + 0x100000000) >> n
|
||||
|
||||
url = 'http://i.snssdk.com/video/urls/v/1/toutiao/mp4/%s' % vid
|
||||
n = url.replace("http://i.snssdk.com", "")+ '?r=' + r
|
||||
c = binascii.crc32(n.encode("ascii"))
|
||||
s = right_shift(c, 0)
|
||||
return url + '?r=%s&s=%s' % (r, s)
|
||||
url = 'https://ib.365yg.com/video/urls/v/1/toutiao/mp4/{vid}'.format(vid=vid)
|
||||
n = urlparse(url).path + '?r=' + r
|
||||
b_n = bytes(n, encoding="utf-8")
|
||||
s = binascii.crc32(b_n)
|
||||
aid = 1364
|
||||
ts = int(time.time() * 1000)
|
||||
return url + '?r={r}&s={s}&aid={aid}&vfrom=xgplayer&callback=axiosJsonpCallback1&_={ts}'.format(r=r, s=s, aid=aid,
|
||||
ts=ts)
|
||||
|
||||
|
||||
class ToutiaoVideoInfo(object):
|
||||
@ -43,12 +52,12 @@ def get_file_by_vid(video_id):
|
||||
vRet = []
|
||||
url = sign_video_url(video_id)
|
||||
ret = get_content(url)
|
||||
ret = loads(ret)
|
||||
ret = loads(ret[20:-1])
|
||||
vlist = ret.get('data').get('video_list')
|
||||
if len(vlist) > 0:
|
||||
vInfo = vlist.get(sorted(vlist.keys(), reverse=True)[0])
|
||||
vUrl = vInfo.get('main_url')
|
||||
vUrl = base64.decodestring(vUrl.encode('ascii')).decode('ascii')
|
||||
vUrl = decodebytes(vUrl.encode('ascii')).decode('ascii')
|
||||
videoInfo = ToutiaoVideoInfo()
|
||||
videoInfo.bitrate = vInfo.get('bitrate')
|
||||
videoInfo.definition = vInfo.get('definition')
|
||||
@ -63,8 +72,8 @@ def get_file_by_vid(video_id):
|
||||
|
||||
def toutiao_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
html = get_html(url, faker=True)
|
||||
video_id = match1(html, r"videoid\s*:\s*'([^']+)',\n")
|
||||
title = match1(html, r"title: '([^']+)'.replace")
|
||||
video_id = match1(html, r".*?videoId: '(?P<vid>.*)'")
|
||||
title = match1(html, '.*?<title>(?P<title>.*?)</title>')
|
||||
video_file_list = get_file_by_vid(video_id) # 调api获取视频源文件
|
||||
type, ext, size = url_info(video_file_list[0].url, faker=True)
|
||||
print_info(site_info=site_info, title=title, type=type, size=size)
|
||||
|
@ -13,7 +13,29 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only)
|
||||
return
|
||||
|
||||
html = parse.unquote(get_html(url)).replace('\/', '/')
|
||||
import ssl
|
||||
ssl_context = request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1))
|
||||
cookie_handler = request.HTTPCookieProcessor()
|
||||
opener = request.build_opener(ssl_context, cookie_handler)
|
||||
request.install_opener(opener)
|
||||
|
||||
page = get_html(url)
|
||||
form_key = match1(page, r'id="tumblr_form_key" content="([^"]+)"')
|
||||
if form_key is not None:
|
||||
# bypass GDPR consent page
|
||||
referer = 'https://www.tumblr.com/privacy/consent?redirect=%s' % parse.quote_plus(url)
|
||||
post_content('https://www.tumblr.com/svc/privacy/consent',
|
||||
headers={
|
||||
'Content-Type': 'application/json',
|
||||
'User-Agent': fake_headers['User-Agent'],
|
||||
'Referer': referer,
|
||||
'X-tumblr-form-key': form_key,
|
||||
'X-Requested-With': 'XMLHttpRequest'
|
||||
},
|
||||
post_data_raw='{"eu_resident":true,"gdpr_is_acceptable_age":true,"gdpr_consent_core":true,"gdpr_consent_first_party_ads":true,"gdpr_consent_third_party_ads":true,"gdpr_consent_search_history":true,"redirect_to":"%s","gdpr_reconsent":false}' % url)
|
||||
page = get_html(url, faker=True)
|
||||
|
||||
html = parse.unquote(page).replace('\/', '/')
|
||||
feed = r1(r'<meta property="og:type" content="tumblr-feed:(\w+)" />', html)
|
||||
|
||||
if feed in ['photo', 'photoset', 'entry'] or feed is None:
|
||||
@ -21,23 +43,31 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
page_title = r1(r'<meta name="description" content="([^"\n]+)', html) or \
|
||||
r1(r'<meta property="og:description" content="([^"\n]+)', html) or \
|
||||
r1(r'<title>([^<\n]*)', html)
|
||||
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.jpg)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^;"]+_\d+\.png)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^";]+_\d+\.gif)', html)
|
||||
urls = re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.jpg)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^;"&]+_\d+\.png)', html) +\
|
||||
re.findall(r'(https?://[^;"&]+/tumblr_[^";&]+_\d+\.gif)', html)
|
||||
|
||||
tuggles = {}
|
||||
for url in urls:
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
if url.endswith('.gif'):
|
||||
hd_url = url
|
||||
elif url.endswith('.jpg'):
|
||||
hd_url = r1(r'(.+)_\d+\.jpg$', url) + '_1280.jpg' # FIXME: decide actual quality
|
||||
elif url.endswith('.png'):
|
||||
hd_url = r1(r'(.+)_\d+\.png$', url) + '_1280.png' # FIXME: decide actual quality
|
||||
else:
|
||||
continue
|
||||
filename = parse.unquote(hd_url.split('/')[-1])
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
tumblr_id = r1(r'^tumblr_(.+)_\d+$', title)
|
||||
quality = int(r1(r'^tumblr_.+_(\d+)$', title))
|
||||
ext = filename.split('.')[-1]
|
||||
try:
|
||||
size = int(get_head(url)['Content-Length'])
|
||||
size = int(get_head(hd_url)['Content-Length'])
|
||||
if tumblr_id not in tuggles or tuggles[tumblr_id]['quality'] < quality:
|
||||
tuggles[tumblr_id] = {
|
||||
'title': title,
|
||||
'url': url,
|
||||
'url': hd_url,
|
||||
'quality': quality,
|
||||
'ext': ext,
|
||||
'size': size,
|
||||
@ -70,6 +100,11 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
real_url = r1(r'<source src="([^"]*)"', html)
|
||||
if not real_url:
|
||||
iframe_url = r1(r'<[^>]+tumblr_video_container[^>]+><iframe[^>]+src=[\'"]([^\'"]*)[\'"]', html)
|
||||
|
||||
if iframe_url is None:
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
if iframe_url:
|
||||
iframe_html = get_content(iframe_url, headers=fake_headers)
|
||||
real_url = r1(r'<video[^>]*>[\n ]*<source[^>]+src=[\'"]([^\'"]*)[\'"]', iframe_html)
|
||||
@ -94,11 +129,15 @@ def tumblr_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
r1(r'<meta property="og:description" content="([^"]*)" />', html) or
|
||||
r1(r'<title>([^<\n]*)', html) or url.split("/")[4]).replace('\n', '')
|
||||
|
||||
type, ext, size = url_info(real_url)
|
||||
# this is better
|
||||
vcode = r1(r'tumblr_(\w+)', real_url)
|
||||
real_url = 'https://vt.media.tumblr.com/tumblr_%s.mp4' % vcode
|
||||
|
||||
type, ext, size = url_info(real_url, faker=True)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([real_url], title, ext, size, output_dir, merge = merge)
|
||||
download_urls([real_url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
site_info = "Tumblr.com"
|
||||
download = tumblr_download
|
||||
|
@ -3,6 +3,7 @@
|
||||
__all__ = ['twitter_download']
|
||||
|
||||
from ..common import *
|
||||
from .universal import *
|
||||
from .vine import vine_download
|
||||
|
||||
def extract_m3u(source):
|
||||
@ -15,13 +16,28 @@ def extract_m3u(source):
|
||||
return ['https://video.twimg.com%s' % i for i in s2]
|
||||
|
||||
def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if re.match(r'https?://pbs\.twimg\.com', url):
|
||||
universal_download(url, output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
if re.match(r'https?://mobile', url): # normalize mobile URL
|
||||
url = 'https://' + match1(url, r'//mobile\.(.+)')
|
||||
|
||||
html = get_html(url)
|
||||
screen_name = r1(r'data-screen-name="([^"]*)"', html) or \
|
||||
if re.match(r'https?://twitter\.com/i/moments/', url): # moments
|
||||
html = get_html(url, faker=True)
|
||||
paths = re.findall(r'data-permalink-path="([^"]+)"', html)
|
||||
for path in paths:
|
||||
twitter_download('https://twitter.com' + path,
|
||||
output_dir=output_dir,
|
||||
merge=merge,
|
||||
info_only=info_only,
|
||||
**kwargs)
|
||||
return
|
||||
|
||||
html = get_html(url, faker=False) # disable faker to prevent 302 infinite redirect
|
||||
screen_name = r1(r'twitter\.com/([^/]+)', url) or r1(r'data-screen-name="([^"]*)"', html) or \
|
||||
r1(r'<meta name="twitter:title" content="([^"]*)"', html)
|
||||
item_id = r1(r'data-item-id="([^"]*)"', html) or \
|
||||
item_id = r1(r'twitter\.com/[^/]+/status/(\d+)', url) or r1(r'data-item-id="([^"]*)"', html) or \
|
||||
r1(r'<meta name="twitter:site:id" content="([^"]*)"', html)
|
||||
page_title = "{} [{}]".format(screen_name, item_id)
|
||||
|
||||
@ -53,39 +69,26 @@ def twitter_download(url, output_dir='.', merge=True, info_only=False, **kwargs)
|
||||
output_dir=output_dir)
|
||||
|
||||
except: # extract video
|
||||
# always use i/cards or videos url
|
||||
if not re.match(r'https?://twitter.com/i/', url):
|
||||
url = r1(r'<meta\s*property="og:video:url"\s*content="([^"]+)"', html)
|
||||
if not url:
|
||||
url = 'https://twitter.com/i/videos/%s' % item_id
|
||||
html = get_content(url)
|
||||
#i_url = 'https://twitter.com/i/videos/' + item_id
|
||||
#i_content = get_content(i_url)
|
||||
#js_url = r1(r'src="([^"]+)"', i_content)
|
||||
#js_content = get_content(js_url)
|
||||
#authorization = r1(r'"(Bearer [^"]+)"', js_content)
|
||||
authorization = 'Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
|
||||
|
||||
data_config = r1(r'data-config="([^"]*)"', html) or \
|
||||
r1(r'data-player-config="([^"]*)"', html)
|
||||
i = json.loads(unescape_html(data_config))
|
||||
if 'video_url' in i:
|
||||
source = i['video_url']
|
||||
item_id = i['tweet_id']
|
||||
page_title = "{} [{}]".format(screen_name, item_id)
|
||||
elif 'playlist' in i:
|
||||
source = i['playlist'][0]['source']
|
||||
if not item_id: page_title = i['playlist'][0]['contentId']
|
||||
elif 'vmap_url' in i:
|
||||
vmap_url = i['vmap_url']
|
||||
vmap = get_content(vmap_url)
|
||||
source = r1(r'<MediaFile>\s*<!\[CDATA\[(.*)\]\]>', vmap)
|
||||
item_id = i['tweet_id']
|
||||
page_title = "{} [{}]".format(screen_name, item_id)
|
||||
elif 'scribe_playlist_url' in i:
|
||||
scribe_playlist_url = i['scribe_playlist_url']
|
||||
return vine_download(scribe_playlist_url, output_dir, merge=merge, info_only=info_only)
|
||||
ga_url = 'https://api.twitter.com/1.1/guest/activate.json'
|
||||
ga_content = post_content(ga_url, headers={'authorization': authorization})
|
||||
guest_token = json.loads(ga_content)['guest_token']
|
||||
|
||||
try:
|
||||
urls = extract_m3u(source)
|
||||
except:
|
||||
urls = [source]
|
||||
api_url = 'https://api.twitter.com/2/timeline/conversation/%s.json?tweet_mode=extended' % item_id
|
||||
api_content = get_content(api_url, headers={'authorization': authorization, 'x-guest-token': guest_token})
|
||||
|
||||
info = json.loads(api_content)
|
||||
variants = info['globalObjects']['tweets'][item_id]['extended_entities']['media'][0]['video_info']['variants']
|
||||
variants = sorted(variants, key=lambda kv: kv.get('bitrate', 0))
|
||||
urls = [ variants[-1]['url'] ]
|
||||
size = urls_size(urls)
|
||||
mime, ext = 'video/mp4', 'mp4'
|
||||
mime, ext = variants[-1]['content_type'], 'mp4'
|
||||
|
||||
print_info(site_info, page_title, mime, size)
|
||||
if not info_only:
|
||||
|
@ -31,16 +31,37 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
if page_title:
|
||||
page_title = unescape_html(page_title)
|
||||
|
||||
meta_videos = re.findall(r'<meta property="og:video:url" content="([^"]*)"', page)
|
||||
if meta_videos:
|
||||
try:
|
||||
for meta_video in meta_videos:
|
||||
meta_video_url = unescape_html(meta_video)
|
||||
type_, ext, size = url_info(meta_video_url)
|
||||
print_info(site_info, page_title, type_, size)
|
||||
if not info_only:
|
||||
download_urls([meta_video_url], page_title,
|
||||
ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=True)
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
return
|
||||
|
||||
hls_urls = re.findall(r'(https?://[^;"\'\\]+' + '\.m3u8?' +
|
||||
r'[^;"\'\\]*)', page)
|
||||
if hls_urls:
|
||||
for hls_url in hls_urls:
|
||||
type_, ext, size = url_info(hls_url)
|
||||
print_info(site_info, page_title, type_, size)
|
||||
if not info_only:
|
||||
download_url_ffmpeg(url=hls_url, title=page_title,
|
||||
ext='mp4', output_dir=output_dir)
|
||||
return
|
||||
try:
|
||||
for hls_url in hls_urls:
|
||||
type_, ext, size = url_info(hls_url)
|
||||
print_info(site_info, page_title, type_, size)
|
||||
if not info_only:
|
||||
download_url_ffmpeg(url=hls_url, title=page_title,
|
||||
ext='mp4', output_dir=output_dir)
|
||||
except:
|
||||
pass
|
||||
else:
|
||||
return
|
||||
|
||||
# most common media file extensions on the Internet
|
||||
media_exts = ['\.flv', '\.mp3', '\.mp4', '\.webm',
|
||||
@ -54,12 +75,12 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
|
||||
urls = []
|
||||
for i in media_exts:
|
||||
urls += re.findall(r'(https?://[^;"\'\\]+' + i + r'[^;"\'\\]*)', page)
|
||||
urls += re.findall(r'(https?://[^ ;&"\'\\<>]+' + i + r'[^ ;&"\'\\<>]*)', page)
|
||||
|
||||
p_urls = re.findall(r'(https?%3A%2F%2F[^;&]+' + i + r'[^;&]*)', page)
|
||||
p_urls = re.findall(r'(https?%3A%2F%2F[^;&"]+' + i + r'[^;&"]*)', page)
|
||||
urls += [parse.unquote(url) for url in p_urls]
|
||||
|
||||
q_urls = re.findall(r'(https?:\\\\/\\\\/[^;"\']+' + i + r'[^;"\']*)', page)
|
||||
q_urls = re.findall(r'(https?:\\\\/\\\\/[^ ;"\'<>]+' + i + r'[^ ;"\'<>]*)', page)
|
||||
urls += [url.replace('\\\\/', '/') for url in q_urls]
|
||||
|
||||
# a link href to an image is often an interesting one
|
||||
@ -67,6 +88,17 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
urls += re.findall(r'href="(https?://[^"]+\.png)"', page, re.I)
|
||||
urls += re.findall(r'href="(https?://[^"]+\.gif)"', page, re.I)
|
||||
|
||||
# <img> with high widths
|
||||
urls += re.findall(r'<img src="([^"]*)"[^>]*width="\d\d\d+"', page, re.I)
|
||||
|
||||
# relative path
|
||||
rel_urls = []
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.jpe?g)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.png)"', page, re.I)
|
||||
rel_urls += re.findall(r'href="(\.[^"]+\.gif)"', page, re.I)
|
||||
for rel_url in rel_urls:
|
||||
urls += [ r1(r'(.*/)', url) + rel_url ]
|
||||
|
||||
# MPEG-DASH MPD
|
||||
mpd_urls = re.findall(r'src="(https?://[^"]+\.mpd)"', page)
|
||||
for mpd_url in mpd_urls:
|
||||
@ -80,34 +112,46 @@ def universal_download(url, output_dir='.', merge=True, info_only=False, **kwarg
|
||||
for url in set(urls):
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
if 5 <= len(filename) <= 80:
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
title = '.'.join(filename.split('.')[:-1]) or filename
|
||||
else:
|
||||
title = '%s' % i
|
||||
i += 1
|
||||
|
||||
if r1(r'(https://pinterest.com/pin/)', url):
|
||||
continue
|
||||
|
||||
candies.append({'url': url,
|
||||
'title': title})
|
||||
|
||||
for candy in candies:
|
||||
try:
|
||||
mime, ext, size = url_info(candy['url'], faker=True)
|
||||
if not size: size = float('Int')
|
||||
try:
|
||||
mime, ext, size = url_info(candy['url'], faker=False)
|
||||
assert size
|
||||
except:
|
||||
mime, ext, size = url_info(candy['url'], faker=True)
|
||||
if not size: size = float('Inf')
|
||||
except:
|
||||
continue
|
||||
else:
|
||||
print_info(site_info, candy['title'], ext, size)
|
||||
if not info_only:
|
||||
download_urls([candy['url']], candy['title'], ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=True)
|
||||
try:
|
||||
download_urls([candy['url']], candy['title'], ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=False)
|
||||
except:
|
||||
download_urls([candy['url']], candy['title'], ext, size,
|
||||
output_dir=output_dir, merge=merge,
|
||||
faker=True)
|
||||
return
|
||||
|
||||
else:
|
||||
# direct download
|
||||
filename = parse.unquote(url.split('/')[-1])
|
||||
title = '.'.join(filename.split('.')[:-1])
|
||||
ext = filename.split('.')[-1]
|
||||
_, _, size = url_info(url, faker=True)
|
||||
url_trunk = url.split('?')[0] # strip query string
|
||||
filename = parse.unquote(url_trunk.split('/')[-1]) or parse.unquote(url_trunk.split('/')[-2])
|
||||
title = '.'.join(filename.split('.')[:-1]) or filename
|
||||
_, ext, size = url_info(url, faker=True)
|
||||
print_info(site_info, title, ext, size)
|
||||
if not info_only:
|
||||
download_urls([url], title, ext, size,
|
||||
|
@ -7,6 +7,24 @@ from urllib.parse import urlparse
|
||||
from json import loads
|
||||
import re
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def miaopai_download_by_smid(smid, output_dir = '.', merge = True, info_only = False):
|
||||
""""""
|
||||
api_endpoint = 'https://n.miaopai.com/api/aj_media/info.json?smid={smid}'.format(smid = smid)
|
||||
|
||||
html = get_content(api_endpoint)
|
||||
|
||||
api_content = loads(html)
|
||||
|
||||
video_url = api_content['data']['meta_data'][0]['play_urls']['l']
|
||||
title = api_content['data']['description']
|
||||
|
||||
type, ext, size = url_info(video_url)
|
||||
|
||||
print_info(site_info, title, type, size)
|
||||
if not info_only:
|
||||
download_urls([video_url], title, ext, size, output_dir, merge=merge)
|
||||
|
||||
#----------------------------------------------------------------------
|
||||
def yixia_miaopai_download_by_scid(scid, output_dir = '.', merge = True, info_only = False):
|
||||
""""""
|
||||
@ -47,14 +65,18 @@ def yixia_xiaokaxiu_download_by_scid(scid, output_dir = '.', merge = True, info_
|
||||
def yixia_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
"""wrapper"""
|
||||
hostname = urlparse(url).hostname
|
||||
if 'miaopai.com' in hostname: #Miaopai
|
||||
if 'n.miaopai.com' == hostname:
|
||||
smid = match1(url, r'n\.miaopai\.com/media/([^.]+)')
|
||||
miaopai_download_by_smid(smid, output_dir, merge, info_only)
|
||||
return
|
||||
elif 'miaopai.com' in hostname: #Miaopai
|
||||
yixia_download_by_scid = yixia_miaopai_download_by_scid
|
||||
site_info = "Yixia Miaopai"
|
||||
|
||||
scid = match1(url, r'miaopai\.com/show/channel/(.+)\.htm') or \
|
||||
match1(url, r'miaopai\.com/show/(.+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/(.+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/(.+)')
|
||||
scid = match1(url, r'miaopai\.com/show/channel/([^.]+)\.htm') or \
|
||||
match1(url, r'miaopai\.com/show/([^.]+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)\.htm') or \
|
||||
match1(url, r'm\.miaopai\.com/show/channel/([^.]+)')
|
||||
|
||||
elif 'xiaokaxiu.com' in hostname: #Xiaokaxiu
|
||||
yixia_download_by_scid = yixia_xiaokaxiu_download_by_scid
|
||||
|
@ -78,7 +78,10 @@ class Youku(VideoExtractor):
|
||||
self.api_error_code = None
|
||||
self.api_error_msg = None
|
||||
|
||||
self.ccode = '0507'
|
||||
self.ccode = '0519'
|
||||
# Found in http://g.alicdn.com/player/ykplayer/0.5.64/youku-player.min.js
|
||||
# grep -oE '"[0-9a-zA-Z+/=]{256}"' youku-player.min.js
|
||||
self.ckey = 'DIl58SLFxFNndSV1GFNnMQVYkx1PP5tKe1siZu/86PR1u/Wh1Ptd+WOZsHHWxysSfAOhNJpdVWsdVJNsfJ8Sxd8WKVvNfAS8aS8fAOzYARzPyPc3JvtnPHjTdKfESTdnuTW6ZPvk2pNDh4uFzotgdMEFkzQ5wZVXl2Pf1/Y6hLK0OnCNxBj3+nb0v72gZ6b0td+WOZsHHWxysSo/0y9D2K42SaB8Y/+aD2K42SaB8Y/+ahU+WOZsHcrxysooUeND'
|
||||
self.utid = None
|
||||
|
||||
def youku_ups(self):
|
||||
@ -86,6 +89,7 @@ class Youku(VideoExtractor):
|
||||
url += '&client_ip=192.168.1.1'
|
||||
url += '&utid=' + self.utid
|
||||
url += '&client_ts=' + str(int(time.time()))
|
||||
url += '&ckey=' + urllib.parse.quote(self.ckey)
|
||||
if self.password_protected:
|
||||
url += '&password=' + self.password
|
||||
headers = dict(Referer=self.referer)
|
||||
|
@ -8,35 +8,74 @@ from xml.dom.minidom import parseString
|
||||
class YouTube(VideoExtractor):
|
||||
name = "YouTube"
|
||||
|
||||
# YouTube media encoding options, in descending quality order.
|
||||
# Non-DASH YouTube media encoding options, in descending quality order.
|
||||
# http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs. Retrieved July 17, 2014.
|
||||
stream_types = [
|
||||
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '38', 'container': 'MP4', 'video_resolution': '3072p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3.5-5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
#{'itag': '85', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '3-4', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '46', 'container': 'WebM', 'video_resolution': '1080p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '37', 'container': 'MP4', 'video_resolution': '1080p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '3-4.3',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
#{'itag': '102', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
{'itag': '45', 'container': 'WebM', 'video_resolution': '720p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '2',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
#{'itag': '84', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3', 'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p', 'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2', 'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
|
||||
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
{'itag': '22', 'container': 'MP4', 'video_resolution': '720p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'High', 'video_bitrate': '2-3',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '192'},
|
||||
{'itag': '120', 'container': 'FLV', 'video_resolution': '720p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main@L3.1', 'video_bitrate': '2',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'}, # Live streaming only
|
||||
{'itag': '44', 'container': 'WebM', 'video_resolution': '480p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '1',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '35', 'container': 'FLV', 'video_resolution': '480p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.8-1',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
#{'itag': '101', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '192'},
|
||||
#{'itag': '100', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '3D', 'video_bitrate': '', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p', 'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
{'itag': '43', 'container': 'WebM', 'video_resolution': '360p',
|
||||
'video_encoding': 'VP8', 'video_profile': '', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'Vorbis', 'audio_bitrate': '128'},
|
||||
{'itag': '34', 'container': 'FLV', 'video_resolution': '360p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Main', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '128'},
|
||||
#{'itag': '82', 'container': 'MP4', 'video_resolution': '360p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '18', 'container': 'MP4', 'video_resolution': '270p/360p', 'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '18', 'container': 'MP4', 'video_resolution': '360p',
|
||||
'video_encoding': 'H.264', 'video_profile': 'Baseline', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '6', 'container': 'FLV', 'video_resolution': '270p',
|
||||
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.8',
|
||||
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
#{'itag': '83', 'container': 'MP4', 'video_resolution': '240p', 'video_encoding': 'H.264', 'video_profile': '3D', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': '96'},
|
||||
{'itag': '13', 'container': '3GP', 'video_resolution': '', 'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5', 'audio_encoding': 'AAC', 'audio_bitrate': ''},
|
||||
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p', 'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25', 'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '36', 'container': '3GP', 'video_resolution': '240p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175', 'audio_encoding': 'AAC', 'audio_bitrate': '36'},
|
||||
{'itag': '17', 'container': '3GP', 'video_resolution': '144p', 'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05', 'audio_encoding': 'AAC', 'audio_bitrate': '24'},
|
||||
{'itag': '13', 'container': '3GP', 'video_resolution': '',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': '', 'video_bitrate': '0.5',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': ''},
|
||||
{'itag': '5', 'container': 'FLV', 'video_resolution': '240p',
|
||||
'video_encoding': 'Sorenson H.263', 'video_profile': '', 'video_bitrate': '0.25',
|
||||
'audio_encoding': 'MP3', 'audio_bitrate': '64'},
|
||||
{'itag': '36', 'container': '3GP', 'video_resolution': '240p',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.175',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '32'},
|
||||
{'itag': '17', 'container': '3GP', 'video_resolution': '144p',
|
||||
'video_encoding': 'MPEG-4 Visual', 'video_profile': 'Simple', 'video_bitrate': '0.05',
|
||||
'audio_encoding': 'AAC', 'audio_bitrate': '24'},
|
||||
]
|
||||
|
||||
def decipher(js, s):
|
||||
# Examples:
|
||||
# - https://www.youtube.com/yts/jsbin/player-da_DK-vflWlK-zq/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vflvABTsY/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vfls4aurX/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player_ias-vfl_RGK2l/en_US/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player-vflRjqq_w/da_DK/base.js
|
||||
# - https://www.youtube.com/yts/jsbin/player_ias-vfl-jbnrr/da_DK/base.js
|
||||
def tr_js(code):
|
||||
code = re.sub(r'function', r'def', code)
|
||||
code = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', code)
|
||||
@ -52,11 +91,14 @@ class YouTube(VideoExtractor):
|
||||
return code
|
||||
|
||||
js = js.replace('\n', ' ')
|
||||
f1 = match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)')
|
||||
f1 = match1(js, r'\.set\(\w+\.sp,encodeURIComponent\(([$\w]+)') or \
|
||||
match1(js, r'\.set\(\w+\.sp,\(0,window\.encodeURIComponent\)\(([$\w]+)') or \
|
||||
match1(js, r'\.set\(\w+\.sp,([$\w]+)\(\w+\.s\)\)') or \
|
||||
match1(js, r'"signature",([$\w]+)\(\w+\.\w+\)')
|
||||
f1def = match1(js, r'function %s(\(\w+\)\{[^\{]+\})' % re.escape(f1)) or \
|
||||
match1(js, r'\W%s=function(\(\w+\)\{[^\{]+\})' % re.escape(f1))
|
||||
f1def = re.sub(r'([$\w]+\.)([$\w]+\(\w+,\d+\))', r'\2', f1def)
|
||||
f1def = 'function %s%s' % (f1, f1def)
|
||||
f1def = 'function main_%s%s' % (f1, f1def) # prefix to avoid potential namespace conflict
|
||||
code = tr_js(f1def)
|
||||
f2s = set(re.findall(r'([$\w]+)\(\w+,\d+\)', f1def))
|
||||
for f2 in f2s:
|
||||
@ -67,16 +109,26 @@ class YouTube(VideoExtractor):
|
||||
else:
|
||||
f2def = re.search(r'[^$\w]%s:function\((\w+)\)(\{[^\{\}]+\})' % f2e, js)
|
||||
f2def = 'function {}({},b){}'.format(f2e, f2def.group(1), f2def.group(2))
|
||||
f2 = re.sub(r'(\W)(as|if|in|is|or)\(', r'\1_\2(', f2)
|
||||
f2 = re.sub(r'(as|if|in|is|or)', r'_\1', f2)
|
||||
f2 = re.sub(r'\$', '_dollar', f2)
|
||||
code = code + 'global %s\n' % f2 + tr_js(f2def)
|
||||
|
||||
f1 = re.sub(r'(as|if|in|is|or)', r'_\1', f1)
|
||||
f1 = re.sub(r'\$', '_dollar', f1)
|
||||
code = code + 'sig=%s(s)' % f1
|
||||
code = code + 'sig=main_%s(s)' % f1 # prefix to avoid potential namespace conflict
|
||||
exec(code, globals(), locals())
|
||||
return locals()['sig']
|
||||
|
||||
def chunk_by_range(url, size):
|
||||
urls = []
|
||||
chunk_size = 10485760
|
||||
start, end = 0, chunk_size - 1
|
||||
urls.append('%s&range=%s-%s' % (url, start, end))
|
||||
while end + 1 < size: # processed size < expected size
|
||||
start, end = end + 1, end + chunk_size
|
||||
urls.append('%s&range=%s-%s' % (url, start, end))
|
||||
return urls
|
||||
|
||||
def get_url_from_vid(vid):
|
||||
return 'https://youtu.be/{}'.format(vid)
|
||||
|
||||
@ -128,7 +180,10 @@ class YouTube(VideoExtractor):
|
||||
for video in videos:
|
||||
vid = parse_query_param(video, 'v')
|
||||
index = parse_query_param(video, 'index')
|
||||
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
|
||||
try:
|
||||
self.__class__().download_by_url(self.__class__.get_url_from_vid(vid), index=index, **kwargs)
|
||||
except:
|
||||
pass
|
||||
|
||||
def prepare(self, **kwargs):
|
||||
assert self.url or self.vid
|
||||
@ -140,15 +195,22 @@ class YouTube(VideoExtractor):
|
||||
self.download_playlist_by_url(self.url, **kwargs)
|
||||
exit(0)
|
||||
|
||||
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}'.format(self.vid)))
|
||||
if re.search('\Wlist=', self.url) and not kwargs.get('playlist'):
|
||||
log.w('This video is from a playlist. (use --playlist to download all videos in the playlist.)')
|
||||
|
||||
# Get video info
|
||||
# 'eurl' is a magic parameter that can bypass age restriction
|
||||
# full form: 'eurl=https%3A%2F%2Fyoutube.googleapis.com%2Fv%2F{VIDEO_ID}'
|
||||
video_info = parse.parse_qs(get_content('https://www.youtube.com/get_video_info?video_id={}&eurl=https%3A%2F%2Fy'.format(self.vid)))
|
||||
logging.debug('STATUS: %s' % video_info['status'][0])
|
||||
|
||||
ytplayer_config = None
|
||||
if 'status' not in video_info:
|
||||
log.wtf('[Failed] Unknown status.')
|
||||
log.wtf('[Failed] Unknown status.', exit_code=None)
|
||||
raise
|
||||
elif video_info['status'] == ['ok']:
|
||||
if 'use_cipher_signature' not in video_info or video_info['use_cipher_signature'] == ['False']:
|
||||
self.title = parse.unquote_plus(video_info['title'][0])
|
||||
|
||||
self.title = parse.unquote_plus(json.loads(video_info["player_response"][0])["videoDetails"]["title"])
|
||||
# Parse video page (for DASH)
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
try:
|
||||
@ -156,27 +218,50 @@ class YouTube(VideoExtractor):
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
# Workaround: get_video_info returns bad s. Why?
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
#stream_list = ytplayer_config['args']['adaptive_fmts'].split(',')
|
||||
except:
|
||||
stream_list = video_info['url_encoded_fmt_stream_map'][0].split(',')
|
||||
self.html5player = None
|
||||
if re.search('([^"]*/base\.js)"', video_page):
|
||||
self.html5player = 'https://www.youtube.com' + re.search('([^"]*/base\.js)"', video_page).group(1)
|
||||
else:
|
||||
self.html5player = None
|
||||
|
||||
else:
|
||||
# Parse video page instead
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+?});', video_page).group(1))
|
||||
|
||||
self.title = ytplayer_config['args']['title']
|
||||
self.title = json.loads(ytplayer_config["args"]["player_response"])["videoDetails"]["title"]
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
|
||||
elif video_info['status'] == ['fail']:
|
||||
logging.debug('ERRORCODE: %s' % video_info['errorcode'][0])
|
||||
if video_info['errorcode'] == ['150']:
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
# FIXME: still relevant?
|
||||
if cookies:
|
||||
# Load necessary cookies into headers (for age-restricted videos)
|
||||
consent, ssid, hsid, sid = 'YES', '', '', ''
|
||||
for cookie in cookies:
|
||||
if cookie.domain.endswith('.youtube.com'):
|
||||
if cookie.name == 'SSID':
|
||||
ssid = cookie.value
|
||||
elif cookie.name == 'HSID':
|
||||
hsid = cookie.value
|
||||
elif cookie.name == 'SID':
|
||||
sid = cookie.value
|
||||
cookie_str = 'CONSENT=%s; SSID=%s; HSID=%s; SID=%s' % (consent, ssid, hsid, sid)
|
||||
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid,
|
||||
headers={'Cookie': cookie_str})
|
||||
else:
|
||||
video_page = get_content('https://www.youtube.com/watch?v=%s' % self.vid)
|
||||
|
||||
try:
|
||||
ytplayer_config = json.loads(re.search('ytplayer.config\s*=\s*([^\n]+});ytplayer', video_page).group(1))
|
||||
except:
|
||||
msg = re.search('class="message">([^<]+)<', video_page).group(1)
|
||||
log.wtf('[Failed] "%s"' % msg.strip())
|
||||
log.wtf('[Failed] Got message "%s". Try to login with --cookies.' % msg.strip())
|
||||
|
||||
if 'title' in ytplayer_config['args']:
|
||||
# 150 Restricted from playback on certain sites
|
||||
@ -185,22 +270,30 @@ class YouTube(VideoExtractor):
|
||||
self.html5player = 'https://www.youtube.com' + ytplayer_config['assets']['js']
|
||||
stream_list = ytplayer_config['args']['url_encoded_fmt_stream_map'].split(',')
|
||||
else:
|
||||
log.wtf('[Error] The uploader has not made this video available in your country.')
|
||||
log.wtf('[Error] The uploader has not made this video available in your country.', exit_code=None)
|
||||
raise
|
||||
#self.title = re.search('<meta name="title" content="([^"]+)"', video_page).group(1)
|
||||
#stream_list = []
|
||||
|
||||
elif video_info['errorcode'] == ['100']:
|
||||
log.wtf('[Failed] This video does not exist.', exit_code=int(video_info['errorcode'][0]))
|
||||
log.wtf('[Failed] This video does not exist.', exit_code=None) #int(video_info['errorcode'][0])
|
||||
raise
|
||||
|
||||
else:
|
||||
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=int(video_info['errorcode'][0]))
|
||||
log.wtf('[Failed] %s' % video_info['reason'][0], exit_code=None) #int(video_info['errorcode'][0])
|
||||
raise
|
||||
|
||||
else:
|
||||
log.wtf('[Failed] Invalid status.')
|
||||
log.wtf('[Failed] Invalid status.', exit_code=None)
|
||||
raise
|
||||
|
||||
# YouTube Live
|
||||
if ytplayer_config and (ytplayer_config['args'].get('livestream') == '1' or ytplayer_config['args'].get('live_playback') == '1'):
|
||||
hlsvp = ytplayer_config['args']['hlsvp']
|
||||
if 'hlsvp' in ytplayer_config['args']:
|
||||
hlsvp = ytplayer_config['args']['hlsvp']
|
||||
else:
|
||||
player_response= json.loads(ytplayer_config['args']['player_response'])
|
||||
log.e('[Failed] %s' % player_response['playabilityStatus']['reason'], exit_code=1)
|
||||
|
||||
if 'info_only' in kwargs and kwargs['info_only']:
|
||||
return
|
||||
@ -216,7 +309,8 @@ class YouTube(VideoExtractor):
|
||||
'url': metadata['url'][0],
|
||||
'sig': metadata['sig'][0] if 'sig' in metadata else None,
|
||||
's': metadata['s'][0] if 's' in metadata else None,
|
||||
'quality': metadata['quality'][0],
|
||||
'quality': metadata['quality'][0] if 'quality' in metadata else None,
|
||||
#'quality': metadata['quality_label'][0] if 'quality_label' in metadata else None,
|
||||
'type': metadata['type'][0],
|
||||
'mime': metadata['type'][0].split(';')[0],
|
||||
'container': mime_to_container(metadata['type'][0].split(';')[0]),
|
||||
@ -286,13 +380,15 @@ class YouTube(VideoExtractor):
|
||||
if not dash_size:
|
||||
try: dash_size = url_size(dash_url)
|
||||
except: continue
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%sx%s' % (w, h),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'mp4',
|
||||
'src': [dash_url, dash_mp4_a_url],
|
||||
'src': [dash_urls, dash_mp4_a_urls],
|
||||
'size': int(dash_size) + int(dash_mp4_a_size)
|
||||
}
|
||||
elif mimeType == 'video/webm':
|
||||
@ -306,75 +402,97 @@ class YouTube(VideoExtractor):
|
||||
if not dash_size:
|
||||
try: dash_size = url_size(dash_url)
|
||||
except: continue
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_webm_a_urls = self.__class__.chunk_by_range(dash_webm_a_url, int(dash_webm_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%sx%s' % (w, h),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'webm',
|
||||
'src': [dash_url, dash_webm_a_url],
|
||||
'src': [dash_urls, dash_webm_a_urls],
|
||||
'size': int(dash_size) + int(dash_webm_a_size)
|
||||
}
|
||||
except:
|
||||
# VEVO
|
||||
if not self.html5player: return
|
||||
self.js = get_content(self.html5player)
|
||||
if 'adaptive_fmts' in ytplayer_config['args']:
|
||||
|
||||
try:
|
||||
# Video info from video page (not always available)
|
||||
streams = [dict([(i.split('=')[0],
|
||||
parse.unquote(i.split('=')[1]))
|
||||
for i in afmt.split('&')])
|
||||
for afmt in ytplayer_config['args']['adaptive_fmts'].split(',')]
|
||||
for stream in streams: # get over speed limiting
|
||||
stream['url'] += '&ratebypass=yes'
|
||||
for stream in streams: # audio
|
||||
if stream['type'].startswith('audio/mp4'):
|
||||
dash_mp4_a_url = stream['url']
|
||||
except:
|
||||
streams = [dict([(i.split('=')[0],
|
||||
parse.unquote(i.split('=')[1]))
|
||||
for i in afmt.split('&')])
|
||||
for afmt in video_info['adaptive_fmts'][0].split(',')]
|
||||
|
||||
for stream in streams: # get over speed limiting
|
||||
stream['url'] += '&ratebypass=yes'
|
||||
for stream in streams: # audio
|
||||
if stream['type'].startswith('audio/mp4'):
|
||||
dash_mp4_a_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_mp4_a_url += '&sig={}'.format(sig)
|
||||
dash_mp4_a_size = stream['clen']
|
||||
elif stream['type'].startswith('audio/webm'):
|
||||
dash_webm_a_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_webm_a_url += '&sig={}'.format(sig)
|
||||
dash_webm_a_size = stream['clen']
|
||||
for stream in streams: # video
|
||||
if 'size' in stream:
|
||||
if stream['type'].startswith('video/mp4'):
|
||||
mimeType = 'video/mp4'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_mp4_a_url += '&signature={}'.format(sig)
|
||||
dash_mp4_a_size = stream['clen']
|
||||
elif stream['type'].startswith('audio/webm'):
|
||||
dash_webm_a_url = stream['url']
|
||||
dash_url += '&sig={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
dash_mp4_a_urls = self.__class__.chunk_by_range(dash_mp4_a_url, int(dash_mp4_a_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'mp4',
|
||||
'src': [dash_urls, dash_mp4_a_urls],
|
||||
'size': int(dash_size) + int(dash_mp4_a_size)
|
||||
}
|
||||
elif stream['type'].startswith('video/webm'):
|
||||
mimeType = 'video/webm'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_webm_a_url += '&signature={}'.format(sig)
|
||||
dash_webm_a_size = stream['clen']
|
||||
for stream in streams: # video
|
||||
if 'size' in stream:
|
||||
if stream['type'].startswith('video/mp4'):
|
||||
mimeType = 'video/mp4'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_url += '&signature={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
self.dash_streams[itag] = {
|
||||
'quality': stream['size'],
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'mp4',
|
||||
'src': [dash_url, dash_mp4_a_url],
|
||||
'size': int(dash_size) + int(dash_mp4_a_size)
|
||||
}
|
||||
elif stream['type'].startswith('video/webm'):
|
||||
mimeType = 'video/webm'
|
||||
dash_url = stream['url']
|
||||
if 's' in stream:
|
||||
sig = self.__class__.decipher(self.js, stream['s'])
|
||||
dash_url += '&signature={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
self.dash_streams[itag] = {
|
||||
'quality': stream['size'],
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'webm',
|
||||
'src': [dash_url, dash_webm_a_url],
|
||||
'size': int(dash_size) + int(dash_webm_a_size)
|
||||
}
|
||||
dash_url += '&sig={}'.format(sig)
|
||||
dash_size = stream['clen']
|
||||
itag = stream['itag']
|
||||
audio_url = None
|
||||
audio_size = None
|
||||
try:
|
||||
audio_url = dash_webm_a_url
|
||||
audio_size = int(dash_webm_a_size)
|
||||
except UnboundLocalError as e:
|
||||
audio_url = dash_mp4_a_url
|
||||
audio_size = int(dash_mp4_a_size)
|
||||
dash_urls = self.__class__.chunk_by_range(dash_url, int(dash_size))
|
||||
audio_urls = self.__class__.chunk_by_range(audio_url, int(audio_size))
|
||||
self.dash_streams[itag] = {
|
||||
'quality': '%s (%s)' % (stream['size'], stream['quality_label']),
|
||||
'itag': itag,
|
||||
'type': mimeType,
|
||||
'mime': mimeType,
|
||||
'container': 'webm',
|
||||
'src': [dash_urls, audio_urls],
|
||||
'size': int(dash_size) + int(audio_size)
|
||||
}
|
||||
|
||||
def extract(self, **kwargs):
|
||||
if not self.streams_sorted:
|
||||
@ -396,13 +514,13 @@ class YouTube(VideoExtractor):
|
||||
src = self.streams[stream_id]['url']
|
||||
if self.streams[stream_id]['sig'] is not None:
|
||||
sig = self.streams[stream_id]['sig']
|
||||
src += '&signature={}'.format(sig)
|
||||
src += '&sig={}'.format(sig)
|
||||
elif self.streams[stream_id]['s'] is not None:
|
||||
if not hasattr(self, 'js'):
|
||||
self.js = get_content(self.html5player)
|
||||
s = self.streams[stream_id]['s']
|
||||
sig = self.__class__.decipher(self.js, s)
|
||||
src += '&signature={}'.format(sig)
|
||||
src += '&sig={}'.format(sig)
|
||||
|
||||
self.streams[stream_id]['src'] = [src]
|
||||
self.streams[stream_id]['size'] = urls_size(self.streams[stream_id]['src'])
|
||||
|
55
src/you_get/extractors/zhibo.py
Normal file
55
src/you_get/extractors/zhibo.py
Normal file
@ -0,0 +1,55 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['zhibo_download']
|
||||
|
||||
from ..common import *
|
||||
|
||||
def zhibo_vedio_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
# http://video.zhibo.tv/video/details/d103057f-663e-11e8-9d83-525400ccac43.html
|
||||
|
||||
html = get_html(url)
|
||||
title = r1(r'<title>([\s\S]*)</title>', html)
|
||||
total_size = 0
|
||||
part_urls= []
|
||||
|
||||
video_html = r1(r'<script type="text/javascript">([\s\S]*)</script></head>', html)
|
||||
|
||||
# video_guessulike = r1(r"window.xgData =([s\S'\s\.]*)\'\;[\s\S]*window.vouchData", video_html)
|
||||
video_url = r1(r"window.vurl = \'([s\S'\s\.]*)\'\;[\s\S]*window.imgurl", video_html)
|
||||
part_urls.append(video_url)
|
||||
ext = video_url.split('.')[-1]
|
||||
|
||||
print_info(site_info, title, ext, total_size)
|
||||
if not info_only:
|
||||
download_urls(part_urls, title, ext, total_size, output_dir=output_dir, merge=merge)
|
||||
|
||||
|
||||
def zhibo_download(url, output_dir = '.', merge = True, info_only = False, **kwargs):
|
||||
if 'video.zhibo.tv' in url:
|
||||
zhibo_vedio_download(url, output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
return
|
||||
|
||||
# if 'v.zhibo.tv' in url:
|
||||
# http://v.zhibo.tv/31609372
|
||||
html = get_html(url)
|
||||
title = r1(r'<title>([\s\S]*)</title>', html)
|
||||
is_live = r1(r"window.videoIsLive=\'([s\S'\s\.]*)\'\;[\s\S]*window.resDomain", html)
|
||||
if is_live != "1":
|
||||
raise ValueError("The live stream is not online! (Errno:%s)" % is_live)
|
||||
|
||||
match = re.search(r"""
|
||||
ourStreamName .*?
|
||||
'(.*?)' .*?
|
||||
rtmpHighSource .*?
|
||||
'(.*?)' .*?
|
||||
'(.*?)'
|
||||
""", html, re.S | re.X)
|
||||
real_url = match.group(3) + match.group(1) + match.group(2)
|
||||
|
||||
print_info(site_info, title, 'flv', float('inf'))
|
||||
if not info_only:
|
||||
download_url_ffmpeg(real_url, title, 'flv', params={}, output_dir=output_dir, merge=merge)
|
||||
|
||||
site_info = "zhibo.tv"
|
||||
download = zhibo_download
|
||||
download_playlist = playlist_not_supported('zhibo')
|
79
src/you_get/extractors/zhihu.py
Normal file
79
src/you_get/extractors/zhihu.py
Normal file
@ -0,0 +1,79 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
__all__ = ['zhihu_download', 'zhihu_download_playlist']
|
||||
|
||||
from ..common import *
|
||||
import json
|
||||
|
||||
|
||||
def zhihu_download(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
paths = url.split("/")
|
||||
# question or column
|
||||
if len(paths) < 3 and len(paths) < 6:
|
||||
raise TypeError("URL does not conform to specifications, Support column and question only."
|
||||
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
|
||||
"https://www.zhihu.com/question/267782048/answer/490720324")
|
||||
|
||||
if ("question" not in paths or "answer" not in paths) and "zhuanlan.zhihu.com" not in paths:
|
||||
raise TypeError("URL does not conform to specifications, Support column and question only."
|
||||
"Example URL: https://zhuanlan.zhihu.com/p/51669862 or "
|
||||
"https://www.zhihu.com/question/267782048/answer/490720324")
|
||||
|
||||
html = get_html(url, faker=True)
|
||||
title = match1(html, r'data-react-helmet="true">(.*?)</title>')
|
||||
for index, video_id in enumerate(matchall(html, [r'<a class="video-box" href="\S+video/(\d+)"'])):
|
||||
try:
|
||||
video_info = json.loads(
|
||||
get_content(r"https://lens.zhihu.com/api/videos/{}".format(video_id), headers=fake_headers))
|
||||
except json.decoder.JSONDecodeError:
|
||||
log.w("Video id not found:{}".format(video_id))
|
||||
continue
|
||||
|
||||
play_list = video_info["playlist"]
|
||||
# first High Definition
|
||||
# second Second Standard Definition
|
||||
# third ld. What is ld ?
|
||||
# finally continue
|
||||
data = play_list.get("hd", play_list.get("sd", play_list.get("ld", None)))
|
||||
if not data:
|
||||
log.w("Video id No play address:{}".format(video_id))
|
||||
continue
|
||||
print_info(site_info, title, data["format"], data["size"])
|
||||
if not info_only:
|
||||
ext = "_{}.{}".format(index, data["format"])
|
||||
if kwargs.get("zhihu_offset"):
|
||||
ext = "_{}".format(kwargs["zhihu_offset"]) + ext
|
||||
download_urls([data["play_url"]], title, ext, data["size"],
|
||||
output_dir=output_dir, merge=merge, **kwargs)
|
||||
|
||||
|
||||
def zhihu_download_playlist(url, output_dir='.', merge=True, info_only=False, **kwargs):
|
||||
if "question" not in url or "answer" in url: # question page
|
||||
raise TypeError("URL does not conform to specifications, Support question only."
|
||||
" Example URL: https://www.zhihu.com/question/267782048")
|
||||
url = url.split("?")[0]
|
||||
if url[-1] == "/":
|
||||
question_id = url.split("/")[-2]
|
||||
else:
|
||||
question_id = url.split("/")[-1]
|
||||
videos_url = r"https://www.zhihu.com/api/v4/questions/{}/answers".format(question_id)
|
||||
try:
|
||||
questions = json.loads(get_content(videos_url))
|
||||
except json.decoder.JSONDecodeError:
|
||||
raise TypeError("Check whether the problem URL exists.Example URL: https://www.zhihu.com/question/267782048")
|
||||
|
||||
count = 0
|
||||
while 1:
|
||||
for data in questions["data"]:
|
||||
kwargs["zhihu_offset"] = count
|
||||
zhihu_download("https://www.zhihu.com/question/{}/answer/{}".format(question_id, data["id"]),
|
||||
output_dir=output_dir, merge=merge, info_only=info_only, **kwargs)
|
||||
count += 1
|
||||
if questions["paging"]["is_end"]:
|
||||
return
|
||||
questions = json.loads(get_content(questions["paging"]["next"], headers=fake_headers))
|
||||
|
||||
|
||||
site_info = "zhihu.com"
|
||||
download = zhihu_download
|
||||
download_playlist = zhihu_download_playlist
|
@ -1,8 +1,9 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import logging
|
||||
import os.path
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
from ..util.strings import parameterize
|
||||
from ..common import print_more_compatible as print
|
||||
|
||||
@ -21,12 +22,10 @@ def get_usable_ffmpeg(cmd):
|
||||
out, err = p.communicate()
|
||||
vers = str(out, 'utf-8').split('\n')[0].split()
|
||||
assert (vers[0] == 'ffmpeg' and vers[2][0] > '0') or (vers[0] == 'avconv')
|
||||
#set version to 1.0 for nightly build and print warning
|
||||
try:
|
||||
version = [int(i) for i in vers[2].split('.')]
|
||||
v = vers[2][1:] if vers[2][0] == 'n' else vers[2]
|
||||
version = [int(i) for i in v.split('.')]
|
||||
except:
|
||||
print('It seems that your ffmpeg is a nightly build.')
|
||||
print('Please switch to the latest stable if merging failed.')
|
||||
version = [1, 0]
|
||||
return cmd, 'ffprobe', version
|
||||
except:
|
||||
@ -60,14 +59,25 @@ def ffmpeg_concat_av(files, output, ext):
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
for file in files:
|
||||
if os.path.isfile(file): params.extend(['-i', file])
|
||||
params.extend(['-c:v', 'copy'])
|
||||
if ext == 'mp4':
|
||||
params.extend(['-c:a', 'aac'])
|
||||
elif ext == 'webm':
|
||||
params.extend(['-c:a', 'vorbis'])
|
||||
params.extend(['-strict', 'experimental'])
|
||||
params.extend(['-c', 'copy'])
|
||||
params.append(output)
|
||||
return subprocess.call(params, stdin=STDIN)
|
||||
if subprocess.call(params, stdin=STDIN):
|
||||
print('Merging without re-encode failed.\nTry again re-encoding audio... ', end="", flush=True)
|
||||
try: os.remove(output)
|
||||
except FileNotFoundError: pass
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
for file in files:
|
||||
if os.path.isfile(file): params.extend(['-i', file])
|
||||
params.extend(['-c:v', 'copy'])
|
||||
if ext == 'mp4':
|
||||
params.extend(['-c:a', 'aac'])
|
||||
params.extend(['-strict', 'experimental'])
|
||||
elif ext == 'webm':
|
||||
params.extend(['-c:a', 'opus'])
|
||||
params.append(output)
|
||||
return subprocess.call(params, stdin=STDIN)
|
||||
else:
|
||||
return 0
|
||||
|
||||
def ffmpeg_convert_ts_to_mkv(files, output='output.mkv'):
|
||||
for file in files:
|
||||
@ -210,7 +220,7 @@ def ffmpeg_concat_mp4_to_mp4(files, output='output.mp4'):
|
||||
def ffmpeg_download_stream(files, title, ext, params={}, output_dir='.', stream=True):
|
||||
"""str, str->True
|
||||
WARNING: NOT THE SAME PARMS AS OTHER FUNCTIONS!!!!!!
|
||||
You can basicly download anything with this function
|
||||
You can basically download anything with this function
|
||||
but better leave it alone with
|
||||
"""
|
||||
output = title + '.' + ext
|
||||
@ -257,6 +267,7 @@ def ffmpeg_concat_audio_and_video(files, output, ext):
|
||||
if has_ffmpeg_installed:
|
||||
params = [FFMPEG] + LOGLEVEL
|
||||
params.extend(['-f', 'concat'])
|
||||
params.extend(['-safe', '0']) # https://stackoverflow.com/questions/38996925/ffmpeg-concat-unsafe-file-name
|
||||
for file in files:
|
||||
if os.path.isfile(file):
|
||||
params.extend(['-i', file])
|
||||
|
@ -1,8 +1,8 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
import platform
|
||||
from .os import detect_os
|
||||
|
||||
def legitimize(text, os=platform.system()):
|
||||
def legitimize(text, os=detect_os()):
|
||||
"""Converts a string to a valid filename.
|
||||
"""
|
||||
|
||||
@ -13,7 +13,8 @@ def legitimize(text, os=platform.system()):
|
||||
ord('|'): '-',
|
||||
})
|
||||
|
||||
if os == 'Windows':
|
||||
# FIXME: do some filesystem detection
|
||||
if os == 'windows' or os == 'cygwin' or os == 'wsl':
|
||||
# Windows (non-POSIX namespace)
|
||||
text = text.translate({
|
||||
# Reserved in Windows VFAT and NTFS
|
||||
@ -28,10 +29,11 @@ def legitimize(text, os=platform.system()):
|
||||
ord('>'): '-',
|
||||
ord('['): '(',
|
||||
ord(']'): ')',
|
||||
ord('\t'): ' ',
|
||||
})
|
||||
else:
|
||||
# *nix
|
||||
if os == 'Darwin':
|
||||
if os == 'mac':
|
||||
# Mac OS HFS+
|
||||
text = text.translate({
|
||||
ord(':'): '-',
|
||||
|
@ -96,3 +96,9 @@ def wtf(message, exit_code=1):
|
||||
print_log(message, RED, BOLD)
|
||||
if exit_code is not None:
|
||||
sys.exit(exit_code)
|
||||
|
||||
def yes_or_no(message):
|
||||
ans = str(input('%s (y/N) ' % message)).lower().strip()
|
||||
if ans == 'y':
|
||||
return True
|
||||
return False
|
||||
|
32
src/you_get/util/os.py
Normal file
32
src/you_get/util/os.py
Normal file
@ -0,0 +1,32 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
from platform import system
|
||||
|
||||
def detect_os():
|
||||
"""Detect operating system.
|
||||
"""
|
||||
|
||||
# Inspired by:
|
||||
# https://github.com/scivision/pybashutils/blob/78b7f2b339cb03b1c37df94015098bbe462f8526/pybashutils/windows_linux_detect.py
|
||||
|
||||
syst = system().lower()
|
||||
os = 'unknown'
|
||||
|
||||
if 'cygwin' in syst:
|
||||
os = 'cygwin'
|
||||
elif 'darwin' in syst:
|
||||
os = 'mac'
|
||||
elif 'linux' in syst:
|
||||
os = 'linux'
|
||||
# detect WSL https://github.com/Microsoft/BashOnWindows/issues/423
|
||||
try:
|
||||
with open('/proc/version', 'r') as f:
|
||||
if 'microsoft' in f.read().lower():
|
||||
os = 'wsl'
|
||||
except: pass
|
||||
elif 'windows' in syst:
|
||||
os = 'windows'
|
||||
elif 'bsd' in syst:
|
||||
os = 'bsd'
|
||||
|
||||
return os
|
@ -1,4 +1,4 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
script_name = 'you-get'
|
||||
__version__ = '0.4.1025'
|
||||
__version__ = '0.4.1328'
|
||||
|
@ -7,6 +7,7 @@ from you_get.extractors import (
|
||||
magisto,
|
||||
youtube,
|
||||
bilibili,
|
||||
toutiao,
|
||||
)
|
||||
|
||||
|
||||
@ -31,14 +32,6 @@ class YouGetTests(unittest.TestCase):
|
||||
info_only=True
|
||||
)
|
||||
|
||||
def test_bilibili(self):
|
||||
bilibili.download(
|
||||
'https://www.bilibili.com/video/av16907446/', info_only=True
|
||||
)
|
||||
bilibili.download(
|
||||
'https://www.bilibili.com/video/av13228063/', info_only=True
|
||||
)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
|
@ -6,6 +6,7 @@ from you_get.util.fs import *
|
||||
|
||||
class TestUtil(unittest.TestCase):
|
||||
def test_legitimize(self):
|
||||
self.assertEqual(legitimize("1*2", os="Linux"), "1*2")
|
||||
self.assertEqual(legitimize("1*2", os="Darwin"), "1*2")
|
||||
self.assertEqual(legitimize("1*2", os="Windows"), "1-2")
|
||||
self.assertEqual(legitimize("1*2", os="linux"), "1*2")
|
||||
self.assertEqual(legitimize("1*2", os="mac"), "1*2")
|
||||
self.assertEqual(legitimize("1*2", os="windows"), "1-2")
|
||||
self.assertEqual(legitimize("1*2", os="wsl"), "1-2")
|
||||
|
@ -25,6 +25,7 @@
|
||||
"Programming Language :: Python :: 3.4",
|
||||
"Programming Language :: Python :: 3.5",
|
||||
"Programming Language :: Python :: 3.6",
|
||||
"Programming Language :: Python :: 3.7",
|
||||
"Topic :: Internet",
|
||||
"Topic :: Internet :: WWW/HTTP",
|
||||
"Topic :: Multimedia",
|
||||
|
Loading…
x
Reference in New Issue
Block a user